1 of 66

Spatial Storage Models and Literature Survey �State of the Art and New Developments

Csci – 8715 Spatial Data Science Research

Acknowledgement: Yan Li

2 of 66

Learning Objectives

  • After this segment, students will be able to
    • Describe Spatial Data Storage Models
    • List main contributions and organize literature of following papers:
    • Research Skill: Literature Review and Articulating Novelty

2

3 of 66

Weekly Topic: Spatial Data Storage

3

Genre

File Structure

Indexing

Compression

Relational

Heap, Ordered, Hashed

B-tree, B+-tree

Hashing

JPEG, MP3

Spatial

Space filling curve

R-tree, R+/R* trees

Spatial Network

CCAM �Min-cut of edges

Spatiotemporal

Spatial Big Data

4 of 66

Common Relational Data Storage File Structures

  • Heap or unordered or unstructured
    • Ordered
    • Hashed
    • Clustered

5 of 66

Relational Data Storage Examples

5

B+ Tree

Search tree

6 of 66

Spatial Data Storage Examples

6

Spatial filling curve

Index

Key

Hilbert

0

A

3

1

D

10

2

E

16

3

H

25

4

G

39

5

F

44

6

C

50

7

B

57

R-tree

7 of 66

Spatial Network Storage Example

7

    • Which partitioning reduces disk I/O for graph operations?
      • Choice 1: Geometric partition
      • Choice 2: min-cut Graph Partition
      • Choice 2 cuts fewer edges and is preferred
      • Assuming uniform querying popularity across edges

8 of 66

Spatial Network Storage Example (cont’d)

    • Consider two disk-paging of Minneapolis major roads
      • Non-white edges => node pair in same page
      • White edge are cut-edges
      • Node partitions on right has fewer cut-edges and is preferred

9 of 66

Weekly Topic: Spatial Data Storage

9

Genre

File Structure

Indexing

Compression

Relational

Heap, Ordered, Hashed

B-tree, B+-tree

Hashing

JPEG, MP3

Spatial

Space filling curve

R-tree, R+/R* trees

Spatial Network

CCAM �Min-cut of edges

Trajic

Spatiotemporal

Snapshot,

Lagrangian,

Min-cut of hyper-edges

TPRtree, MVRtree, pyramid, …

REST: A Reference based framework for ST trajectory compression (G5 - Spring 2022)

Spatial Big Data

Distributed models

SpatialHadoop (R-tree on Hadoop), GPU …

10 of 66

Learning Objectives

  • After this segment, students will be able to
    • Describe Spatial Data Storage Models
    • List main contributions and organize literature of following papers:
    • Research Skill: Literature Review and Articulating Novelty

10

11 of 66

Recall Critical Reading Questions

  • Problem statement: Formally define the problem addressed in the paper. Briefly explain the significance of the problem.
  • List the major contributions of the paper.
  • List the key concepts behind the approach in this paper. 
  • What is the validation methodology (e.g. case studies, statistical hypothesis testing, proofs, simulation) used in this paper?
  • List the assumptions made by the authors. Critique an assumption that you believe is unreasonable.
  • If you were to rewrite this paper today, what would you preserve and what would be revise?

11

12 of 66

SSTN-G∀S Problem Statement

Input:

• A spatio-temporal network S

• A set of spatio-temporal operations O

e.g., Get_all_Successors(G∀S )

Output:

• Data Partitioning of S, across data pages

Objective:

• Minimize data page access for operations in O

Constraints:

• S is too large for storage in main memory.

• Temporal-edge attribute information needs to be preserved.

12

13 of 66

Significance

  • Societal applications requiring to store spatio-temporal networks
  • Surface and air transportation systems:
    • The Mobility Monitoring Program, Airline services,...

13

14 of 66

Major Contributions

  • Prove that the SSTN-G∀S problem is NP-hard
  • Approach: a novel storage and access method (LCP-G∀S)
  • Validation
    • Analyze the algebraic cost of the LCP-G∀S algorithm
    • Experimental evaluation of LCP-G∀S
      • Using real-world and synthetic STN datasets

15 of 66

Key Concepts

  • Spatio-Temporal Networks (STN)

  • Spatio-Temporal Network Operations:
    • FindNode(n, t),LGetNodeTransition(Node(n, t)),
    • LGetOneSuccessor(Node(n, t), ns)
    • LGetAllSuccessors(Node(n, t))

  • Lagrangian Paths:
    • paths in a spatio-temporal network

  • STN conceptual Models
    • Graph: G = (N,E,T), N: nodes, E: edges, T: time steps
    • Hypergraph = (N, hyperedges group each node with successors)

  • Data Structure:
    • node has node id, time step, time-varying attribute
    • edge time-varying attribute

16 of 66

Key Concepts

  • Disk storage
    • needs STN partitioning into disk pages

  • Partitioning Methods
    • Orthogonal
    • Non-Orthogonal Partitioning of STNs
      • Min-cuts edges (CCAM) or hyperedges (this work)

access data in memory

Search disk data pages

yes

no

node i

Load relevant disk page in memory

In-memory data page?

17 of 66

Validation Methodology

  • Proofs
    • SSTN-G∀S is NP-hard
    • Time Complexity

  • Experimental Evaluation
    • Accuracy of cost models to predict the disk I/Os for STN operations
    • What is the effect of the blocking factor? #time steps, edge/node ratio, …

17

18 of 66

Assumptions

  • Secondary indexing techniques are not considered
  • Did not consider Time dependent shortest-path
  • Did evaluate effect of Tabu-search on solution quality
  • Assumed constant time cost for insert, retrieve, delete, update
  • Assume that STN data has high time resolution with non-uniform STN properties

18

19 of 66

Revisions

  • Approximation algorithms.
  • Investigate novel indexing techniques for STNs.
  • Extend the Lagrangian reference framework to model hierarchical networks.
  • Developing parallel algorithms to reduce computation time.

19

20 of 66

Literature Review : Laundry list approach

20

21 of 66

Literature Review: Decision Table Approach

  • Long laundry list provide few insights.
  • Alternative: Classify the proposed method and related works

  • Decision Table Approach
    • Rows: Candidate methods
    • Columns: Criteria, e.g., Geometry/topology, Get_a_successor /Get_all_successor, …

21

Approach

Connectivity-based grouping

Non-Orthogonal

All successors

Logitudinal

Snapshot

Min-cut of edges

Min-Cut of hyperedges

22 of 66

Literature Review: Decision Tree Approach

min-cut of edges = LCP-G1S: Lagrangian-Connectivity Partitioning For LGetOneSuccessors()

Min-cut of hyperedges = LCP-G∀S: Lagrangian-Connectivity Partitioning For LGetAllSuccessors()

related work

preliminary

work

focus of

the paper

23 of 66

Related Work and Limitations (Revised)

LCP-G1S: Lagrangian-Connectivity Partitioning For LGetOneSuccessors()

LCP-G∀S: Lagrangian-Connectivity Partitioning For LGetAllSuccessors()

related work

preliminary work

focus of the paper

All successors Methods

 

Non-Orthogonal Methods

SnapShot,�Longitudinal

Min-cuts�(LCP-G1S)

Yes

Yes

No

No

24 of 66

Quiz

Q? Which literature review method best articulates novelty of proposed approach?

    • (a) Laundry List
    • (b) Decision Table
    • (c ) Decision Tree
    • (d) Decision Tree showing Proposed Work as a child of the root node

24

25 of 66

Quiz

25

1. What is one of the main motivations cited for researching storage methods for spatio-temporal networks?�(a) Entertainment applications like games.�(b) Scientific visualization.�(c) Societal applications like transportation systems.�(d) Geographic information systems��2. What challenge in handling Spatio-Temporal Network (STN) datasets is primarily addressed in the paper?�(a) Reducing the computational complexity of data analysis�(b) Minimizing disk I/O costs during STN operations�(c) Increasing the accuracy of real-time data tracking�(d) Enhancing the security of data transmission in STNs��3. What is one challenge identified for future work?�(a) Indexing techniques for STNs.�(b) Parallelization of LCP algorithms.�(c) Extending to semantically rich STNs.�(d) Continuous STN data models.

26 of 66

Learning Objectives

  • After this segment, students will be able to
    • Describe Spatial Data Storage Models
    • List main contributions and organize literature of following papers:
    • Research Skill: Literature Review and Articulating Novelty

26

27 of 66

Societal Importance

Huge value of trajectory data

  • Trajectories of phones, cars, airplanes, …
  • Ride-sharing, traffic speed estimation, …
  • Location based advertisement
  • 2011 McKinsey Big Data Report
    • 600 B annual value by 2020
  • Challenge: Big data volume, velocity, …

10 million cars, 1 point/second = 7 PB/year

  • Compression
    • Limited storage
    • Limited band width for data transmission

“Two Billion Cars: Driving Toward Sustainability” Daniel Sperling, Deborah Gordon

28 of 66

Problem Statement

  • Input:
    • A trajectory with n points, T = {t0, t1, …, tn-1},
    • An error margin e

    • Output :A compressed representation T’ = {t0’, t1’, …, tn-1’}

    • Objective
    • Minimize the compression ratio,

    • Constraint
    • On-line setting, i.e., O(n) time complexity
    • ? Meet error margin for Compression error for smooth trajectories

Storage space for T

Storage space for T’

29 of 66

Contributions

    • A new on-line algorithm
      • O(N), same as Delta compression
      • Using prediction methods
      • Temporally-aware linear predictor
      • New dynamic leading zero encoding scheme

30 of 66

Key concepts

  • Trajectory
    • A sequence of spatio-temporal points T = {t1,t2,t3,….}
    • Each point contains longitude, latitude, timestamp etc.

ti = (loni, lati, tsi)

  • Prediction of next point from previous points

  • Lossy compression vs. loss-less compression

  • Error margin
    • The maximum error that can be tolerated for compression

31 of 66

Key concepts�Prediction of Next Points from Previous ones

  • Ex. Linear spatial prediction

31

32 of 66

Example Comparing Trajic with Delta

Point(n)

X(n)

Y(n)

P(0)

100001

200002

P(1)

101001

201004

P(2)

102001

302007

P(3)

103001

103011

Point(n)

Delta X

Delta Y

P(1)

1000

1002

P(2)

1000

1003

P(3)

1000

1004

Point(n)

X(n)

Y(n)

P(0)

100001

200002

Point(n)

PX(n) error

PY(n) error

P1

0

2

P2

0

3

P3

0

4

Point(n)

X

Y

P0

100001

200002

Delta

PX(n) = X(n-1) + 1000

PY(n) = Y(n-1) + 1000

Trajectory

Trajic

Candidate

Large Numbers (8 bytes)

Medium Numbers

(4 bytes)

Small Numbers

(2 bytes)

Total Bytes

Trajectory

8

0

0

8*8 = 64

Delta

2

6

0

2*8 + 6*4 = 40

Trajic

2

0

6

2*8 + 6*2 = 28

33 of 66

Key concepts: System Architecture

  • System architecture to generate encoded residuals

33

34 of 66

Key concepts

  • High level pseudocode

34

35 of 66

Comparison with Douglas Peucker Algorithm

  • Douglas Peucker’s line generalization/simplification algorithm
    • Can reduce the number of points within margin approximation error
      • This can result in better compression ratio
    • However, Delta and Trajic keep all points in the trajectories
      • ?lossless compression

36 of 66

Critical Reading Exercise

Note: Computational complexity of Douglas Peucker line simplification method is N2 .

Literature has N*log(N) complexity for this problem.

Q? How can this paper provide O(N) algorithm? What gives?

Q? Is proposed algorithm correct and complete?

Will it find find a solution if there exists one meeting the error bound ?

Criteria

Delta

Trajic

Douglas Peucker

Chan 2012

Hersberger 1992

On-line

yes

yes

no

no

Preserve number of points

yes

yes

no

no

Time Complexity

O(N)

O(N)

O(N^2)

O(N^2)

O(N log(N)

Compression Quality

Better than Delta

Better if many points dropped

optimal

37 of 66

Validation methodology

  • Goal: What is the effect of error bound on compression ratio?

  • Real-world dataset
    • University of Illinois (213 trajectories of movements of two people)
    • Microsoft GeoLife (17,621 trajectories of walking, cycling and driving 2007-2011

  • Comparative analysis
    • TD-SED (a popular line generalization algorithm)
    • Delta compression
    • SQUISH (one pass algorithm)
    • Trajic (proposed algorithm)

38 of 66

Critical Reading: Comparing Claims with Evidence

Is this supported by evidence in paper?

How does Douglas Peucker etc. fit here?

39 of 66

Assumptions

  • Smooth trajectories, On-line use case, One-pass algorithms
  • No significant gaps in the trajectories
    • Ship trajectories
  • Did not model GPS inaccuracy
  • Frequency of length of residuals follows a distribution
    • The distribution can be correctly found (past predicts future)
    • Small residual because trajectories are smooth
  • 2D plane (Euclidean space) instead of Network space
  • Ignore multi-attributed trajectories
    • Onboard diagnostics (OBD) data from vehicles

40 of 66

Revisions

  • Combine orthogonal ideas (e.g., line simplification + bit compression)
  • Experiment should compare to the missing related work
    • More experiments on self-evaluation
    • Explain when the assumptions are valid in a discussion section.
  • Add a future work section

  • Add correctness and completeness analysis

  • Add missing related work, e.g.,
    • Ideal compression algorithm (Chan et al., 1996, Jr. of Computational Geometry)
    • A difficulty in interdisciplinary research
    • How to reduce risk?

41 of 66

Quiz

41

3. What was the primary motivation behind developing the Trajic compression system ?�(a) To create a more efficient GPS tracking system.�(b) To address limitations in existing trajectory compression methods, particularly regarding small error margins.�(c) To reduce the cost of data storage in large-scale spatial databases.�(d) To improve the accuracy of trajectory data analysis.�

4. What is a novel aspect of the Trajic compression system compared to previous trajectory compression methods ?�(a) It introduces an entirely new trajectory data format.�(b) It employs delta compression with a novel residual encoding scheme.�(c) It uses machine learning algorithms for prediction. �(d) It focuses solely on real-time trajectory data compression. ��5. What assumption does the Trajic system challenge or modify compared to traditional trajectory compression methods ? �(a) It assumes trajectory data is always linear and predictable.�(b) It assumes trajectory data is highly irregular and cannot be compressed efficiently.�(c) It challenges the assumption that high compression ratios require large error margins.�(d) It assumes all trajectory data must be stored in a lossless format.

42 of 66

Learning Objectives

  • After this segment, students will be able to
    • Describe Spatial Data Storage Models
    • List main contributions and organize literature of following papers:
    • Research Skill: Literature Review and Articulating Novelty

42

43 of 66

Outline

    • Critical Reading
    • Literature Review: A Reflection
      • Has anyone else publishes on this problem?
      • Resources for Literature Search (e.g., Google Scholar, DBLP, …)
      • Organizing literature review
        • Laundry List
        • Decision Table
        • Decision Tree

44 of 66

Reflection on Literature Review

    • Paper refers to an ”Ideal” compressed solution,
      • which is better than those found by proposed method.

    • Has anyone published algorithms to find the “ideal” compression ?
      • If so, should literature review cite those papers?

45 of 66

Literature review (gps trajectory compression)

46 of 66

Literature Review: GPS trajectory synonyms, e.g., curve

46

47 of 66

Resources for Literature Search

  • Sources
    • Databases allowing keyword search
      • Use many different keywords related to a concept
      • Ex. Google Scholar, DBLP, Citeseer
    • Journals, Conference Proceedings
      • Browse the tables of content
      • Ex. Digital Libraries from ACM, IEEE
    • Survey papers, overview articles
      • Dedicated source: ACM Computing surveys, wikipedia
      • Other sources: Many other journals may publish survey papers
    • Subject matter experts
      • Identify key researchers working in related areas
      • Seek advice on related work
    • Completeness

  • Reading at different levels
    • Review titles and abstracts of several papers
    • Identify critical elements like contributions
    • Select subset of relevant papers for detailed reading

48 of 66

A Simple Decision Table and Tree

48

Delta

Proposed

(Trajic)

Line Generalization

Criteria

Douglas Peucker

Chan 2012

Hersberger 1992

On-line?

yes

yes

no

no

Preserve #points?

yes

yes

no

no

Time Complexity

O(N)

O(N)

O(N^2)

O(N^2)

O(N log(N)

Compression Quality

Better than Delta

optimal

On-line O(N) ?

Delta

Line Generalization

Prediction

N

N

Y

Y

Proposed (Trajic)

Optimality?

Douglas Peucker

N

Y

Solution Quality: Chan 2012;

Compute Cost: Hershberger 1992

49 of 66

Section 2

    • 2.1 Line Generalization [4, 6, 10, 11, 13, 21]
    • 2.2 Delta Compression,
      • e.g., TrajStore [5]
    • 2.3 Other Methods:
      • road-network based [2, 3]
      • one-pass [7]
      • SQUISH [14]

50 of 66

Cited Literature in the Trajic Paper (pp. 14)

51 of 66

Summarizing Cited Literature: �Laundry List to Decision Table

51

Family

Support off road data

Disc-

ard points

Discard points

Complexity

Temporal

Predictor

Line generalization

Yes

Yes

Douglas Peucker [6]

Hershberger [8]

Openning window [11]

Sync. Euc. Distance [18]

Yes

Yes

Yes

Yes

O(N2)

O(N logN)

O(N2)

No

No

No

Yes

Yes

SQUISH [14]

O(N)

Delta compression

Yes

No

TrajsStore [5]

No

O(N)

Constant

Other

No

No

Use road network [2,3]

Yes

1-pass sampling [7]

Linear

Proposed (Trajic)

Yes

No

O(N)

Multiple predictor

Exercise: Create a decision tree using the following table

52 of 66

Decision Table to Decision Tree

52

Need road map

Line generalization

e.g., Douglas Peucker,

Hershburger, opening window

Map matching

One pass O(n)?

Y

Y

N

N

Discard data points

SQUISH

Predictor

Trajstore

1-pass sampling

Proposed Trajic

Y

N

constant

linear

richer

53 of 66

Best Practice: Literature Review

  • Reading at different levels
    • Review titles and abstracts of several papers
    • Identify critical elements like contributions
    • Select subset of relevant papers for detailed reading

54 of 66

Literature Survey

  • Sources
    • Databases allowing keyword search
      • Use many different keywords related to a concept
      • Ex. Google Scholar, DBLP, Citeseer
    • Journals, Conference Proceedings
      • Browse the tables of content
      • Ex. Digital Libraries from ACM, IEEE
    • Survey papers, overview articles
      • Dedicated source: ACM Computing surveys, wikipedia
      • Other sources: Many other journals may publish survey papers
    • Subject matter experts
      • Identify key researchers working in related areas
      • Seek advice on related work

55 of 66

Literature Survey (by author)

56 of 66

Literature Survey (by conference / journal)

57 of 66

Literature Survey (by topic)

58 of 66

Literature Survey Tools

59 of 66

Summary of Literature

  • Linear list style
    • Short paragraph summarizing each paper group
    • Highlight contributions in context of the problem under study
  • Classification
    • Provide a categorization of related work
    • Describing each category rather than individual papers
    • Examples - ACM Computing Surveys, many papers in the textbook
    • Form: table, tree
  • To articulate novelty, desirable properties for classifications
    • Articulates that contribution is first order
      • Binary classification separating proposed work from related work
      • Choice of classification criteria takes effort
      • Start with a decision table and then choose best column

60 of 66

Exercise 1

  • Q? Which organization technique is closest to the literature review (secc. 1.2) in “Lagrangian Approaches to Storage of Spatio-Temporal Network Datasets. Yang, K., +, IEEE Trans. On Knowledge and Data Eng., pp. 2222-2236 , Sept. 2014”?
    1. Linear list style
    2. Classification: Decision Table
    3. Classification: Decision Tree
    4. None of the above

Figure 2: Connectivity Based STN

Data Grouping Methods

61 of 66

Exercise 2

  • Q? Which organization technique is closest to the literature review in Trajic: An Effective Compression System for Trajectory Data. Nibali, A., +, IEEE Trans. On Knowledge and Data Eng., pp. 3138-3151, Nov. 2015”?
    1. Linear list style
    2. Classification: Decision Table
    3. Classification: Decision Tree
    4. None of the above

    • Recall Section 2
      • 2.1 Line Generalization
        • [4, 6, 10, 11, 13, 21]
      • 2.2 Delta Compression,
        • e.g., TrajStore [5]
      • 2.3 Other Methods:
        • road-network based [2, 3]
        • one-pass [7]
        • SQUISH [14]

62 of 66

Exercise 3

  1. First read draft literature review on next slide
  2. It a partial classification of papers followed by a laundary-list of a few papers.
  3. Develop a unified classification of all papers.
  4. Condense the draft text into one paragraph with a unified classification scheme.

63 of 66

This task aims to create a richer enumeration space containing all simple paths (i.e. paths that do not have a cycle or repetitive edges).

Our approach entails enumerating all simple paths and then evaluating them. Current approaches for enumerating all simple paths in a graph [142]–[150] can be roughly divided into three categories: (i) approaches that use the power of adjacency matrices [142]–[144], (ii) approaches that use depth/breadth first search (DFS/BFS) with backtracking [145]–[150] and (iii) approaches that use cut vertices (i.e. articulation points) and bi-connected components [150]. Power of adjacency-matrix based methods have high cost of matrix multiplications and equal edge weight assumptions, DFS/BFS with backtracking methods consider only a single node pair on each iteration, and the methods that use cut vertices assume there are enough cut vertices to reduce the enumeration cost. However, cut vertices are rare in transportation networks, e.g. roadmaps.

There are also some approaches dealing with multi-path on wireless network [151, 152). However, they aim at a different problem of delivering packets on multiple known paths to avoid excessive delay instead on paths enumeration. Another work which measures the betweenness centrality of a social network using random walks [153]. It uses random walk to simulate a rich set of possible paths beyond shortest path. However, it does not guarantee generating the complete set of simple path.

Pre-computing and network partitioning is popular in speeding up routing algorithms. Given a partitioned network, “edge-flag” approaches [154] store the information whether each edge whether it is in any shortest path to each fragment. When computing shortest path later, this information is used to reduce the search space. “Network contraction” approaches [155, 166] order and replace nodes by adding “shortcut edges” which represent the shortest distance between their end nodes. “Hierarchical” approaches [157,158] pre-compute and store shortest path inside each fragment and between each pair of boundary nodes. Later, the shortest path between nodes in two fragments is the path that minimizes Cost(S,Bs)+Cost(Bs,Bd)+Cost(Bd,D), where Bs and Bd are the boundary nodes. However, the speedup of all of these approaches come from eliminating information of non-shortest paths and thus substantially reduce the search space, which make them inapplicable to the all-simple-path enumeration problem.

Exploring simple-path based linear hotspots

64 of 66

[141] N. H. T. S. Administration, “Fatality analysis reporting system (fars) encyclopedia.” [Online]. Available: ftp://ftp.nhtsa.dot.gov/fars/.

[142] J. Ponstein, “Self-Avoiding Paths and the Adjacency Matrix of a Graph,” SIAM J. Appl. Math., vol. 14, no. 3, pp. 600–609, 1966.

[143] G. Danielson, “On finding the simple paths and circuits in a graph,” IEEE Trans. Circuit Theory, vol. 15, no. 3, pp. 294–295, 1968.

[144] F. Rubin, “Enumerating All Simple Paths in a Graph,” IEEE Trans. Circuits Syst., vol. 25, no. 8, pp. 641–642, 1978.

[145] J. C. Tiernan, “An efficient search algorithm to find the elementary circuits of a graph,” Commun. ACM, vol. 13, no. 12, pp. 722–726, 1970.

[146] D. B. Johnson, “Finding all the elementary circuits of a directed graph,” SIAM J. Comput., vol. 4, no. 1, pp. 77–84, 1975.

[147] J. L. Szwarcfiter and P. E. Lauer, “A search strategy for the elementary cycles of a directed graph,” Bit, vol. 16, no. 2, pp. 192–204, 1976.

[148] R. C. Read and R. E. Tarjan, “Bounds on backtrack algorithms for listing cycles, paths, and spanning trees,” Networks, vol. 5, no. 3, pp. 237–252, 1975.

[149] B. Banerjee and B. Chandrasekaran, “On some properties of the Voronoi diagram for planning multiple paths in free space,” Citeseer, vol. 3079, no. January, pp. 1–17, 2007.

[150] R. Ferreira, R. Grossi, and A. Marino, “Optimal Listing of Cycles and st-Paths in Undirected Graphs,” arXiv Prepr. arXiv …, pp. 1884–1896, 2012.

[151] A. Tsirigos and Z. J. Haas, “Multipath routing in the presence of frequent topological changes,” IEEE Commun. Mag., vol. 39, no. 11, pp. 132–138, 2001.

[152] A. Tsirigos and Z. J. Haas, “Analysis of multipath Routing-Part I: the effect on the packet delivery ratio,” IEEE Trans. Wirel. Commun., vol. 3, no. 1, pp. 138–146, 2004.

[153] M. E. J. Newman, “A measure of betweenness centrality based on random walks,” Soc. Networks, vol. 27, no. 1, pp. 39–54, 2005. Elsevier.

[154] U. Lauther, “An extremely fast, exact algorithm for finding shortest paths in static networks with geographical background,” Geoinf. und Mobilit{ä}t-von der Forsch. zur Prakt. Anwendung, vol. 22, pp. 219–230, 2004.

[155] D. Delling, A. V Goldberg, T. Pajor, and R. F. Werneck, “Customizable route planning,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2011, pp. 376–387.

[156] D. Delling and R. F. Werneck, “Faster customization of road networks,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2013, pp. 30–42.

[157] Jing, N., Huang, Y.W. and Rundensteiner, E.A., 1998. Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 10(3), pp.409-432.

[158] Shekhar, S., Fetterer, A. and Goyal, B., 1997, July. Materialization trade-offs in hierarchical shortest path algorithms. In International Symposium on Spatial Databases (pp. 94-111). Springer, Berlin, Heidelberg.

References

65 of 66

Solution

  • Task STI-1: Investigate simple-path based linear hotspots: This task aims to create a richer enumeration space containing all simple paths (i.e., paths that do not have a cycle or repetitive edges). Our approach entails enumerating all simple paths and evaluating their event density interest measure, which is defined as the ratio between the number of events in the path and path length. Current methods for enumerating all simple paths [84]–[92], [93], [94], [95] can be divided into five categories: (i) adjacency-matrix methods [84]–[86], (ii) depth/breadth first search (DFS/BFS) with backtracking [87]–[92], (iii) articulation point [92], (iv) random-walk [95], and (v) multi-path routing [93], [94]. Adjacency-matrix methods use a sequence of matrix multiplications terminated by discovering the longest simple path. DFS/BFS methods use one DFS/BFS traversal (with backtracking) for each node as the starting node with the same termination condition. The articulation point methods assume there are articulation nodes (i.e. nodes whose removal will partition the graph into disjoint sub-graphs), which are rare in transportation networks (e.g. roadmaps). There is additional literature on multi-path routing (e.g., edge-flag [96], Network contraction [97], [98]) and shortest path (e.g., Contraction Hierarchy [99]) discovery which do not enumerate many simple paths. Random-walk based approaches [95] may take a very long time to generate all simple paths, when probabilistic guarantees are not adequate.

66 of 66

[84] J. Ponstein, “Self-Avoiding Paths and the Adjacency Matrix of a Graph,” SIAM J. Appl. Math., vol. 14, no. 3, pp. 600–609, 1966.

[85] G. Danielson, “On finding the simple paths and circuits in a graph,” IEEE Trans. Circuit Theory, vol. 15, no. 3, pp. 294–295, 1968.

[86] F. Rubin, “Enumerating All Simple Paths in a Graph,” IEEE Trans. Circuits Syst., vol. 25, no. 8, pp. 641–642, 1978.

[87] J. C. Tiernan, “An efficient search algorithm to find the elementary circuits of a graph,” Commun. ACM, vol. 13, no. 12, pp. 722–726, 1970.

[88] D. B. Johnson, “Finding all the elementary circuits of a directed graph,” SIAM J. Comput., vol. 4, no. 1, pp. 77–84, 1975.

[89] J. L. Szwarcfiter and P. E. Lauer, “A search strategy for the elementary cycles of a directed graph,” Bit Numer. Math. Springer, vol. 16, no. 2, pp. 192–204, 1976.

[90] R. C. Read and R. E. Tarjan, “Bounds on backtrack algorithms for listing cycles, paths, and spanning trees,” Networks, Wiley Period. Inc, vol. 5, no. 3, pp. 237–252, 1975.

[91] B. Banerjee and B. Chandrasekaran, “On some properties of the Voronoi diagram for planning multiple paths in free space,” Elsevier, vol. 3079, no. January, pp. 1–17, 2007.

[92] G. Birmele, Etienne and Ferreira, Rui and Grossi, Roberto and Marino, Andrea and Pisanti, Nadia and Rizzi, Romeo and Sacomoto, “Optimal Listing of Cycles and st-Paths in Undirected Graphs,” in Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, SIAM Press, 2013, pp. 1884–1896.

[93] A. Tsirigos and Z. J. Haas, “Multipath routing in the presence of frequent topological changes,” IEEE Commun. Mag., vol. 39, no. 11, pp. 132–138, 2001.

[94] A. Tsirigos and Z. J. Haas, “Analysis of multipath Routing-Part I: the effect on the packet delivery ratio,” IEEE Trans. Wirel. Commun., vol. 3, no. 1, pp. 138–146, 2004.

[95] M. E. J. Newman, “A measure of betweenness centrality based on random walks,” Soc. Networks, vol. 27, no. 1, pp. 39–54, 2005. Elsevier.

[96] U. Lauther, “An extremely fast, exact algorithm for finding shortest paths in static networks with geographical background,” Geoinf. und Mobilit{ä}t-von der Forsch. zur Prakt. Anwendung, vol. 22, pp. 219–230, 2004.

[97] D. Delling, A. V Goldberg, T. Pajor, and R. F. Werneck, “Customizable route planning,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2011, pp. 376–387.

[98] D. Delling and R. F. Werneck, “Faster customization of road networks,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2013, pp. 30–42.

[99] F. Schulz, D. Wagner, and C. Zaroliagis, “Using multi-level graphs for timetable information in railway systems,” Algorithm Eng. Exp. Springer, pp. 43–59, 2002.

[100] M. E. J. Newman, “Spectral methods for community detection and graph partitioning,” Phys. Rev. E, American Physcal Society, vol. 88, no. 4, p. 42822, 2013.

References