3 of 66

Weekly Topic: Spatial Data Storage

Genre	File Structure	Indexing	Compression
Relational	Heap, Ordered, Hashed	B-tree, B⁺-tree Hashing	JPEG, MP3
Spatial	Space filling curve	R-tree, R⁺/R* trees
Spatial Network	CCAM �Min-cut of edges
Spatiotemporal
Spatial Big Data

4 of 66

Common Relational Data Storage File Structures

Heap or unordered or unstructured

Ordered
Hashed
Clustered

5 of 66

Relational Data Storage Examples

B+ Tree

Search tree

6 of 66

Spatial Data Storage Examples

Spatial filling curve

Index	Key	Hilbert
0	A	3
1	D	10
2	E	16
3	H	25
4	G	39
5	F	44
6	C	50
7	B	57

R-tree

7 of 66

Spatial Network Storage Example

Which partitioning reduces disk I/O for graph operations?

Choice 1: Geometric partition
Choice 2: min-cut Graph Partition
Choice 2 cuts fewer edges and is preferred
Assuming uniform querying popularity across edges

8 of 66

Spatial Network Storage Example (cont’d)

Consider two disk-paging of Minneapolis major roads

Non-white edges => node pair in same page
White edge are cut-edges
Node partitions on right has fewer cut-edges and is preferred

9 of 66

Weekly Topic: Spatial Data Storage

Genre	File Structure	Indexing	Compression
Relational	Heap, Ordered, Hashed	B-tree, B⁺-tree Hashing	JPEG, MP3
Spatial	Space filling curve	R-tree, R⁺/R* trees
Spatial Network	CCAM �Min-cut of edges		Trajic
Spatiotemporal	Snapshot, Lagrangian, Min-cut of hyper-edges	TPRtree, MVRtree, pyramid, …	REST: A Reference based framework for ST trajectory compression (G5 - Spring 2022)
Spatial Big Data	Distributed models	SpatialHadoop (R-tree on Hadoop), GPU …

10 of 66

Learning Objectives

After this segment, students will be able to

Describe Spatial Data Storage Models
List main contributions and organize literature of following papers:

Yang, KwangSoo, Michael R. Evans, Venkata MV Gunturi, James M. Kang, and Shashi Shekhar. "Lagrangian approaches to storage of spatio-temporal network datasets." IEEE Transactions on Knowledge and Data Engineering 26, no. 9 (2013): 2222-2236.
Nibali, Aiden, and Zhen He. "Trajic: An effective compression system for trajectory data." IEEE Transactions on Knowledge and Data Engineering 27, no. 11 (2015): 3138-3151.

Research Skill: Literature Review and Articulating Novelty

11 of 66

Recall Critical Reading Questions

Problem statement: Formally define the problem addressed in the paper. Briefly explain the significance of the problem.
List the major contributions of the paper.
List the key concepts behind the approach in this paper.
What is the validation methodology (e.g. case studies, statistical hypothesis testing, proofs, simulation) used in this paper?
List the assumptions made by the authors. Critique an assumption that you believe is unreasonable.
If you were to rewrite this paper today, what would you preserve and what would be revise?

12 of 66

SSTN-G∀S Problem Statement

Input:

• A spatio-temporal network S

• A set of spatio-temporal operations O

e.g., Get_all_Successors(G∀S )

Output:

• Data Partitioning of S, across data pages

Objective:

• Minimize data page access for operations in O

Constraints:

• S is too large for storage in main memory.

• Temporal-edge attribute information needs to be preserved.

13 of 66

Significance

Societal applications requiring to store spatio-temporal networks
Surface and air transportation systems:

The Mobility Monitoring Program, Airline services,...

14 of 66

Major Contributions

Prove that the SSTN-G∀S problem is NP-hard
Approach: a novel storage and access method (LCP-G∀S)
Validation

Analyze the algebraic cost of the LCP-G∀S algorithm
Experimental evaluation of LCP-G∀S

Using real-world and synthetic STN datasets

15 of 66

Key Concepts

Spatio-Temporal Networks (STN)

Spatio-Temporal Network Operations:

FindNode(n, t)，LGetNodeTransition(Node(n, t))，
LGetOneSuccessor(Node(n, t), n_s)
LGetAllSuccessors(Node(n, t))

Lagrangian Paths:

paths in a spatio-temporal network

STN conceptual Models

Graph: G = (N,E,T), N: nodes, E: edges, T: time steps
Hypergraph = (N, hyperedges group each node with successors)

Data Structure:

node has node id, time step, time-varying attribute
edge time-varying attribute

16 of 66

Key Concepts

Disk storage

needs STN partitioning into disk pages

Partitioning Methods

Orthogonal
Non-Orthogonal Partitioning of STNs

Min-cuts edges (CCAM) or hyperedges (this work)

access data in memory

Search disk data pages

yes

node i

Load relevant disk page in memory

In-memory data page?

17 of 66

Validation Methodology

Proofs

SSTN-G∀S is NP-hard
Time Complexity

Experimental Evaluation

Accuracy of cost models to predict the disk I/Os for STN operations
What is the effect of the blocking factor? #time steps, edge/node ratio, …

18 of 66

Assumptions

Secondary indexing techniques are not considered
Did not consider Time dependent shortest-path
Did evaluate effect of Tabu-search on solution quality
Assumed constant time cost for insert, retrieve, delete, update
Assume that STN data has high time resolution with non-uniform STN properties

19 of 66

Revisions

Approximation algorithms.
Investigate novel indexing techniques for STNs.
Extend the Lagrangian reference framework to model hierarchical networks.
Developing parallel algorithms to reduce computation time.

20 of 66

Literature Review : Laundry list approach

21 of 66

Literature Review: Decision Table Approach

Long laundry list provide few insights.
Alternative: Classify the proposed method and related works

Decision Table Approach

Rows: Candidate methods
Columns: Criteria, e.g., Geometry/topology, Get_a_successor /Get_all_successor, …

Approach	Connectivity-based grouping	Non-Orthogonal	All successors
Logitudinal	✔
Snapshot	✔
Min-cut of edges	✔	✔
Min-Cut of hyperedges	✔	✔	✔

22 of 66

Literature Review: Decision Tree Approach

min-cut of edges = LCP-G1S: Lagrangian-Connectivity Partitioning For LGetOneSuccessors()

Min-cut of hyperedges = LCP-G∀S: Lagrangian-Connectivity Partitioning For LGetAllSuccessors()

related work

preliminary

work

focus of

the paper

23 of 66

Related Work and Limitations (Revised)

LCP-G1S: Lagrangian-Connectivity Partitioning For LGetOneSuccessors()

LCP-G∀S: Lagrangian-Connectivity Partitioning For LGetAllSuccessors()

related work

preliminary work

focus of the paper

All successors Methods

Non-Orthogonal Methods

SnapShot,�Longitudinal

Min-cuts�(LCP-G1S)

Yes

24 of 66

Quiz

Q? Which literature review method best articulates novelty of proposed approach?

(a) Laundry List
(b) Decision Table
(c ) Decision Tree
(d) Decision Tree showing Proposed Work as a child of the root node

25 of 66

Quiz

1. What is one of the main motivations cited for researching storage methods for spatio-temporal networks?�(a) Entertainment applications like games.�(b) Scientific visualization.�(c) Societal applications like transportation systems.�(d) Geographic information systems��2. What challenge in handling Spatio-Temporal Network (STN) datasets is primarily addressed in the paper?�(a) Reducing the computational complexity of data analysis�(b) Minimizing disk I/O costs during STN operations�(c) Increasing the accuracy of real-time data tracking�(d) Enhancing the security of data transmission in STNs��3. What is one challenge identified for future work?�(a) Indexing techniques for STNs.�(b) Parallelization of LCP algorithms.�(c) Extending to semantically rich STNs.�(d) Continuous STN data models.

26 of 66

Learning Objectives

After this segment, students will be able to

Describe Spatial Data Storage Models
List main contributions and organize literature of following papers:

Yang, KwangSoo, Michael R. Evans, Venkata MV Gunturi, James M. Kang, and Shashi Shekhar. "Lagrangian approaches to storage of spatio-temporal network datasets." IEEE Transactions on Knowledge and Data Engineering 26, no. 9 (2013): 2222-2236.
Nibali, Aiden, and Zhen He. "Trajic: An effective compression system for trajectory data." IEEE Transactions on Knowledge and Data Engineering 27, no. 11 (2015): 3138-3151.

Research Skill: Literature Review and Articulating Novelty

27 of 66

Societal Importance

Huge value of trajectory data

Trajectories of phones, cars, airplanes, …
Ride-sharing, traffic speed estimation, …
Location based advertisement
2011 McKinsey Big Data Report

600 B annual value by 2020

Challenge: Big data volume, velocity, …

10 million cars, 1 point/second = 7 PB/year

Compression

Limited storage
Limited band width for data transmission

“Two Billion Cars: Driving Toward Sustainability” Daniel Sperling, Deborah Gordon

28 of 66

Problem Statement

Input:

A trajectory with n points, T = {t₀, t₁, …, t_n-1},
An error margin e

Output :A compressed representation T’ = {t₀’, t₁’, …, t_n-1’}

Objective
Minimize the compression ratio,

Constraint
On-line setting, i.e., O(n) time complexity
? Meet error margin for Compression error for smooth trajectories

Storage space for T

Storage space for T’

29 of 66

Contributions

A new on-line algorithm

O(N), same as Delta compression
Using prediction methods
Temporally-aware linear predictor
New dynamic leading zero encoding scheme

30 of 66

Key concepts

Trajectory

A sequence of spatio-temporal points T = {t₁,t₂,t₃,….}
Each point contains longitude, latitude, timestamp etc.

t_i = (lon_i, lat_i, ts_i)

Prediction of next point from previous points

Lossy compression vs. loss-less compression

Error margin

The maximum error that can be tolerated for compression

31 of 66

Key concepts�Prediction of Next Points from Previous ones

Ex. Linear spatial prediction

32 of 66

Example Comparing Trajic with Delta

Point(n)	X(n)	Y(n)
P(0)	100001	200002
P(1)	101001	201004
P(2)	102001	302007
P(3)	103001	103011

Point(n)	Delta X	Delta Y
P(1)	1000	1002
P(2)	1000	1003
P(3)	1000	1004

Point(n)	X(n)	Y(n)
P(0)	100001	200002

Point(n)	PX(n) error	PY(n) error
P1	0	2
P2	0	3
P3	0	4

Point(n)	X	Y
P0	100001	200002

Delta

PX(n) = X(n-1) + 1000

PY(n) = Y(n-1) + 1000

Trajectory

Trajic

Candidate	Large Numbers (8 bytes)	Medium Numbers (4 bytes)	Small Numbers (2 bytes)	Total Bytes
Trajectory	8	0	0	8*8 = 64
Delta	2	6	0	28 + 64 = 40
Trajic	2	0	6	28 + 62 = 28

33 of 66

Key concepts: System Architecture

System architecture to generate encoded residuals

34 of 66

Key concepts

High level pseudocode

35 of 66

Comparison with Douglas Peucker Algorithm

Douglas Peucker’s line generalization/simplification algorithm

Can reduce the number of points within margin approximation error

This can result in better compression ratio

However, Delta and Trajic keep all points in the trajectories

?lossless compression

36 of 66

Critical Reading Exercise

Note: Computational complexity of Douglas Peucker line simplification method is N² .

Literature has N*log(N) complexity for this problem.

Q? How can this paper provide O(N) algorithm? What gives?

Q? Is proposed algorithm correct and complete?

Will it find find a solution if there exists one meeting the error bound ?

Criteria	Delta	Trajic	Douglas Peucker	Chan 2012	Hersberger 1992
On-line	yes	yes	no	no
Preserve number of points	yes	yes	no	no
Time Complexity	O(N)	O(N)	O(N^2)	O(N^2)	O(N log(N)
Compression Quality		Better than Delta	Better if many points dropped	optimal

37 of 66

Validation methodology

Goal: What is the effect of error bound on compression ratio?

Real-world dataset

University of Illinois (213 trajectories of movements of two people)
Microsoft GeoLife (17,621 trajectories of walking, cycling and driving 2007-2011

Comparative analysis

TD-SED (a popular line generalization algorithm)
Delta compression
SQUISH (one pass algorithm)
Trajic (proposed algorithm)

38 of 66

Critical Reading: Comparing Claims with Evidence

Is this supported by evidence in paper?

How does Douglas Peucker etc. fit here?

39 of 66

Assumptions

Smooth trajectories, On-line use case, One-pass algorithms
No significant gaps in the trajectories

Ship trajectories

Did not model GPS inaccuracy
Frequency of length of residuals follows a distribution

The distribution can be correctly found (past predicts future)
Small residual because trajectories are smooth

2D plane (Euclidean space) instead of Network space
Ignore multi-attributed trajectories

Onboard diagnostics (OBD) data from vehicles

40 of 66

Revisions

Combine orthogonal ideas (e.g., line simplification + bit compression)
Experiment should compare to the missing related work

More experiments on self-evaluation
Explain when the assumptions are valid in a discussion section.

Add a future work section

Add correctness and completeness analysis

Add missing related work, e.g.,

Ideal compression algorithm (Chan et al., 1996, Jr. of Computational Geometry)
A difficulty in interdisciplinary research
How to reduce risk?

41 of 66

Quiz

3. What was the primary motivation behind developing the Trajic compression system ?�(a) To create a more efficient GPS tracking system.�(b) To address limitations in existing trajectory compression methods, particularly regarding small error margins.�(c) To reduce the cost of data storage in large-scale spatial databases.�(d) To improve the accuracy of trajectory data analysis.�

4. What is a novel aspect of the Trajic compression system compared to previous trajectory compression methods ?�(a) It introduces an entirely new trajectory data format.�(b) It employs delta compression with a novel residual encoding scheme.�(c) It uses machine learning algorithms for prediction. �(d) It focuses solely on real-time trajectory data compression. ��5. What assumption does the Trajic system challenge or modify compared to traditional trajectory compression methods ? �(a) It assumes trajectory data is always linear and predictable.�(b) It assumes trajectory data is highly irregular and cannot be compressed efficiently.�(c) It challenges the assumption that high compression ratios require large error margins.�(d) It assumes all trajectory data must be stored in a lossless format.

42 of 66

Learning Objectives

After this segment, students will be able to

Describe Spatial Data Storage Models
List main contributions and organize literature of following papers:

Yang, KwangSoo, Michael R. Evans, Venkata MV Gunturi, James M. Kang, and Shashi Shekhar. "Lagrangian approaches to storage of spatio-temporal network datasets." IEEE Transactions on Knowledge and Data Engineering 26, no. 9 (2013): 2222-2236.
Nibali, Aiden, and Zhen He. "Trajic: An effective compression system for trajectory data." IEEE Transactions on Knowledge and Data Engineering 27, no. 11 (2015): 3138-3151.

Research Skill: Literature Review and Articulating Novelty

43 of 66

Outline

Critical Reading
Literature Review: A Reflection

Has anyone else publishes on this problem?
Resources for Literature Search (e.g., Google Scholar, DBLP, …)
Organizing literature review

Laundry List
Decision Table
Decision Tree

44 of 66

Reflection on Literature Review

Paper refers to an ”Ideal” compressed solution,

which is better than those found by proposed method.

Has anyone published algorithms to find the “ideal” compression ?

If so, should literature review cite those papers?

45 of 66

Literature review (gps trajectory compression)

46 of 66

Literature Review: GPS trajectory synonyms, e.g., curve

47 of 66

Resources for Literature Search

Sources

Databases allowing keyword search

Use many different keywords related to a concept
Ex. Google Scholar, DBLP, Citeseer

Journals, Conference Proceedings

Browse the tables of content
Ex. Digital Libraries from ACM, IEEE

Survey papers, overview articles

Dedicated source: ACM Computing surveys, wikipedia
Other sources: Many other journals may publish survey papers

Subject matter experts

Identify key researchers working in related areas
Seek advice on related work

Completeness

Reading at different levels

Review titles and abstracts of several papers
Identify critical elements like contributions
Select subset of relevant papers for detailed reading

48 of 66

A Simple Decision Table and Tree

	Delta	Proposed (Trajic)	Line Generalization
Criteria	Delta	Proposed (Trajic)	Douglas Peucker	Chan 2012	Hersberger 1992
On-line?	yes	yes	no	no
Preserve #points?	yes	yes	no	no
Time Complexity	O(N)	O(N)	O(N^2)	O(N^2)	O(N log(N)
Compression Quality		Better than Delta		optimal

On-line O(N) ?

Delta

Line Generalization

Prediction

Proposed (Trajic)

Optimality?

Douglas Peucker

Solution Quality: Chan 2012;

Compute Cost: Hershberger 1992

49 of 66

Section 2

2.1 Line Generalization [4, 6, 10, 11, 13, 21]
2.2 Delta Compression,

e.g., TrajStore [5]

2.3 Other Methods:

road-network based [2, 3]
one-pass [7]
SQUISH [14]

50 of 66

Cited Literature in the Trajic Paper (pp. 14)

51 of 66

Summarizing Cited Literature: �Laundry List to Decision Table

Family	Support off road data	Disc- ard points		Discard points	Complexity	Temporal	Predictor
Line generalization	Yes	Yes	Douglas Peucker [6] Hershberger [8] Openning window [11] Sync. Euc. Distance [18]	Yes Yes Yes Yes	O(N²) O(N logN) O(N²)	No No No Yes
		Yes	SQUISH [14]		O(N)
Delta compression	Yes	No	TrajsStore [5]	No	O(N)		Constant
Other	No	No	Use road network [2,3]
	Yes		1-pass sampling [7]				Linear
Proposed (Trajic)	Yes			No	O(N)		Multiple predictor

Exercise: Create a decision tree using the following table

52 of 66

Decision Table to Decision Tree

Need road map

Line generalization

e.g., Douglas Peucker,

Hershburger, opening window

Map matching

One pass O(n)?

Discard data points

SQUISH

Predictor

Trajstore

1-pass sampling

Proposed Trajic

constant

linear

richer

53 of 66

Best Practice: Literature Review

Reading at different levels

Review titles and abstracts of several papers
Identify critical elements like contributions
Select subset of relevant papers for detailed reading

54 of 66

Literature Survey

Sources

Databases allowing keyword search

Use many different keywords related to a concept
Ex. Google Scholar, DBLP, Citeseer

Journals, Conference Proceedings

Browse the tables of content
Ex. Digital Libraries from ACM, IEEE

Survey papers, overview articles

Dedicated source: ACM Computing surveys, wikipedia
Other sources: Many other journals may publish survey papers

Subject matter experts

Identify key researchers working in related areas
Seek advice on related work

55 of 66

Literature Survey (by author)

56 of 66

Literature Survey (by conference / journal)

57 of 66

Literature Survey (by topic)

58 of 66

Literature Survey Tools

59 of 66

Summary of Literature

Linear list style

Short paragraph summarizing each paper group
Highlight contributions in context of the problem under study

Classification

Provide a categorization of related work
Describing each category rather than individual papers
Examples - ACM Computing Surveys, many papers in the textbook
Form: table, tree

To articulate novelty, desirable properties for classifications

Articulates that contribution is first order

Binary classification separating proposed work from related work
Choice of classification criteria takes effort
Start with a decision table and then choose best column

60 of 66

Exercise 1

Q? Which organization technique is closest to the literature review (secc. 1.2) in “Lagrangian Approaches to Storage of Spatio-Temporal Network Datasets. Yang, K., +, IEEE Trans. On Knowledge and Data Eng., pp. 2222-2236 , Sept. 2014”?

Linear list style
Classification: Decision Table
Classification: Decision Tree
None of the above

Figure 2: Connectivity Based STN

Data Grouping Methods

61 of 66

Exercise 2

Q? Which organization technique is closest to the literature review in “Trajic: An Effective Compression System for Trajectory Data. Nibali, A., +, IEEE Trans. On Knowledge and Data Eng., pp. 3138-3151, Nov. 2015”?

Linear list style
Classification: Decision Table
Classification: Decision Tree
None of the above

Recall Section 2

2.1 Line Generalization

[4, 6, 10, 11, 13, 21]

2.2 Delta Compression,

e.g., TrajStore [5]

2.3 Other Methods:

road-network based [2, 3]
one-pass [7]
SQUISH [14]

62 of 66

Exercise 3

First read draft literature review on next slide
It a partial classification of papers followed by a laundary-list of a few papers.
Develop a unified classification of all papers.
Condense the draft text into one paragraph with a unified classification scheme.

63 of 66

This task aims to create a richer enumeration space containing all simple paths (i.e. paths that do not have a cycle or repetitive edges).

Our approach entails enumerating all simple paths and then evaluating them. Current approaches for enumerating all simple paths in a graph [142]–[150] can be roughly divided into three categories: (i) approaches that use the power of adjacency matrices [142]–[144], (ii) approaches that use depth/breadth first search (DFS/BFS) with backtracking [145]–[150] and (iii) approaches that use cut vertices (i.e. articulation points) and bi-connected components [150]. Power of adjacency-matrix based methods have high cost of matrix multiplications and equal edge weight assumptions, DFS/BFS with backtracking methods consider only a single node pair on each iteration, and the methods that use cut vertices assume there are enough cut vertices to reduce the enumeration cost. However, cut vertices are rare in transportation networks, e.g. roadmaps.

There are also some approaches dealing with multi-path on wireless network [151, 152). However, they aim at a different problem of delivering packets on multiple known paths to avoid excessive delay instead on paths enumeration. Another work which measures the betweenness centrality of a social network using random walks [153]. It uses random walk to simulate a rich set of possible paths beyond shortest path. However, it does not guarantee generating the complete set of simple path.

Pre-computing and network partitioning is popular in speeding up routing algorithms. Given a partitioned network, “edge-flag” approaches [154] store the information whether each edge whether it is in any shortest path to each fragment. When computing shortest path later, this information is used to reduce the search space. “Network contraction” approaches [155, 166] order and replace nodes by adding “shortcut edges” which represent the shortest distance between their end nodes. “Hierarchical” approaches [157,158] pre-compute and store shortest path inside each fragment and between each pair of boundary nodes. Later, the shortest path between nodes in two fragments is the path that minimizes Cost(S,Bs)+Cost(Bs,Bd)+Cost(Bd,D), where Bs and Bd are the boundary nodes. However, the speedup of all of these approaches come from eliminating information of non-shortest paths and thus substantially reduce the search space, which make them inapplicable to the all-simple-path enumeration problem.

Exploring simple-path based linear hotspots

64 of 66

[141] N. H. T. S. Administration, “Fatality analysis reporting system (fars) encyclopedia.” [Online]. Available: ftp://ftp.nhtsa.dot.gov/fars/.

[142] J. Ponstein, “Self-Avoiding Paths and the Adjacency Matrix of a Graph,” SIAM J. Appl. Math., vol. 14, no. 3, pp. 600–609, 1966.

[143] G. Danielson, “On finding the simple paths and circuits in a graph,” IEEE Trans. Circuit Theory, vol. 15, no. 3, pp. 294–295, 1968.

[144] F. Rubin, “Enumerating All Simple Paths in a Graph,” IEEE Trans. Circuits Syst., vol. 25, no. 8, pp. 641–642, 1978.

[145] J. C. Tiernan, “An efficient search algorithm to find the elementary circuits of a graph,” Commun. ACM, vol. 13, no. 12, pp. 722–726, 1970.

[146] D. B. Johnson, “Finding all the elementary circuits of a directed graph,” SIAM J. Comput., vol. 4, no. 1, pp. 77–84, 1975.

[147] J. L. Szwarcfiter and P. E. Lauer, “A search strategy for the elementary cycles of a directed graph,” Bit, vol. 16, no. 2, pp. 192–204, 1976.

[148] R. C. Read and R. E. Tarjan, “Bounds on backtrack algorithms for listing cycles, paths, and spanning trees,” Networks, vol. 5, no. 3, pp. 237–252, 1975.

[149] B. Banerjee and B. Chandrasekaran, “On some properties of the Voronoi diagram for planning multiple paths in free space,” Citeseer, vol. 3079, no. January, pp. 1–17, 2007.

[150] R. Ferreira, R. Grossi, and A. Marino, “Optimal Listing of Cycles and st-Paths in Undirected Graphs,” arXiv Prepr. arXiv …, pp. 1884–1896, 2012.

[151] A. Tsirigos and Z. J. Haas, “Multipath routing in the presence of frequent topological changes,” IEEE Commun. Mag., vol. 39, no. 11, pp. 132–138, 2001.

[152] A. Tsirigos and Z. J. Haas, “Analysis of multipath Routing-Part I: the effect on the packet delivery ratio,” IEEE Trans. Wirel. Commun., vol. 3, no. 1, pp. 138–146, 2004.

[153] M. E. J. Newman, “A measure of betweenness centrality based on random walks,” Soc. Networks, vol. 27, no. 1, pp. 39–54, 2005. Elsevier.

[154] U. Lauther, “An extremely fast, exact algorithm for finding shortest paths in static networks with geographical background,” Geoinf. und Mobilit{ä}t-von der Forsch. zur Prakt. Anwendung, vol. 22, pp. 219–230, 2004.

[155] D. Delling, A. V Goldberg, T. Pajor, and R. F. Werneck, “Customizable route planning,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2011, pp. 376–387.

[156] D. Delling and R. F. Werneck, “Faster customization of road networks,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2013, pp. 30–42.

[157] Jing, N., Huang, Y.W. and Rundensteiner, E.A., 1998. Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 10(3), pp.409-432.

[158] Shekhar, S., Fetterer, A. and Goyal, B., 1997, July. Materialization trade-offs in hierarchical shortest path algorithms. In International Symposium on Spatial Databases (pp. 94-111). Springer, Berlin, Heidelberg.

References

65 of 66

Solution

Task STI-1: Investigate simple-path based linear hotspots: This task aims to create a richer enumeration space containing all simple paths (i.e., paths that do not have a cycle or repetitive edges). Our approach entails enumerating all simple paths and evaluating their event density interest measure, which is defined as the ratio between the number of events in the path and path length. Current methods for enumerating all simple paths [84]–[92], [93], [94], [95] can be divided into five categories: (i) adjacency-matrix methods [84]–[86], (ii) depth/breadth first search (DFS/BFS) with backtracking [87]–[92], (iii) articulation point [92], (iv) random-walk [95], and (v) multi-path routing [93], [94]. Adjacency-matrix methods use a sequence of matrix multiplications terminated by discovering the longest simple path. DFS/BFS methods use one DFS/BFS traversal (with backtracking) for each node as the starting node with the same termination condition. The articulation point methods assume there are articulation nodes (i.e. nodes whose removal will partition the graph into disjoint sub-graphs), which are rare in transportation networks (e.g. roadmaps). There is additional literature on multi-path routing (e.g., edge-flag [96], Network contraction [97], [98]) and shortest path (e.g., Contraction Hierarchy [99]) discovery which do not enumerate many simple paths. Random-walk based approaches [95] may take a very long time to generate all simple paths, when probabilistic guarantees are not adequate.

66 of 66

[84] J. Ponstein, “Self-Avoiding Paths and the Adjacency Matrix of a Graph,” SIAM J. Appl. Math., vol. 14, no. 3, pp. 600–609, 1966.

[85] G. Danielson, “On finding the simple paths and circuits in a graph,” IEEE Trans. Circuit Theory, vol. 15, no. 3, pp. 294–295, 1968.

[86] F. Rubin, “Enumerating All Simple Paths in a Graph,” IEEE Trans. Circuits Syst., vol. 25, no. 8, pp. 641–642, 1978.

[87] J. C. Tiernan, “An efficient search algorithm to find the elementary circuits of a graph,” Commun. ACM, vol. 13, no. 12, pp. 722–726, 1970.

[88] D. B. Johnson, “Finding all the elementary circuits of a directed graph,” SIAM J. Comput., vol. 4, no. 1, pp. 77–84, 1975.

[89] J. L. Szwarcfiter and P. E. Lauer, “A search strategy for the elementary cycles of a directed graph,” Bit Numer. Math. Springer, vol. 16, no. 2, pp. 192–204, 1976.

[90] R. C. Read and R. E. Tarjan, “Bounds on backtrack algorithms for listing cycles, paths, and spanning trees,” Networks, Wiley Period. Inc, vol. 5, no. 3, pp. 237–252, 1975.

[91] B. Banerjee and B. Chandrasekaran, “On some properties of the Voronoi diagram for planning multiple paths in free space,” Elsevier, vol. 3079, no. January, pp. 1–17, 2007.

[92] G. Birmele, Etienne and Ferreira, Rui and Grossi, Roberto and Marino, Andrea and Pisanti, Nadia and Rizzi, Romeo and Sacomoto, “Optimal Listing of Cycles and st-Paths in Undirected Graphs,” in Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, SIAM Press, 2013, pp. 1884–1896.

[93] A. Tsirigos and Z. J. Haas, “Multipath routing in the presence of frequent topological changes,” IEEE Commun. Mag., vol. 39, no. 11, pp. 132–138, 2001.

[94] A. Tsirigos and Z. J. Haas, “Analysis of multipath Routing-Part I: the effect on the packet delivery ratio,” IEEE Trans. Wirel. Commun., vol. 3, no. 1, pp. 138–146, 2004.

[95] M. E. J. Newman, “A measure of betweenness centrality based on random walks,” Soc. Networks, vol. 27, no. 1, pp. 39–54, 2005. Elsevier.

[96] U. Lauther, “An extremely fast, exact algorithm for finding shortest paths in static networks with geographical background,” Geoinf. und Mobilit{ä}t-von der Forsch. zur Prakt. Anwendung, vol. 22, pp. 219–230, 2004.

[97] D. Delling, A. V Goldberg, T. Pajor, and R. F. Werneck, “Customizable route planning,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2011, pp. 376–387.

[98] D. Delling and R. F. Werneck, “Faster customization of road networks,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2013, pp. 30–42.

[99] F. Schulz, D. Wagner, and C. Zaroliagis, “Using multi-level graphs for timetable information in railway systems,” Algorithm Eng. Exp. Springer, pp. 43–59, 2002.

[100] M. E. J. Newman, “Spectral methods for community detection and graph partitioning,” Phys. Rev. E, American Physcal Society, vol. 88, no. 4, p. 42822, 2013.

References