Spatial Storage Models and Literature Survey �State of the Art and New Developments
Csci – 8715 Spatial Data Science Research
Acknowledgement: Yan Li
Learning Objectives
2
Weekly Topic: Spatial Data Storage
3
Genre | File Structure | Indexing | Compression |
Relational | Heap, Ordered, Hashed | B-tree, B+-tree Hashing | JPEG, MP3 |
Spatial | Space filling curve | R-tree, R+/R* trees | |
Spatial Network | CCAM �Min-cut of edges | | |
Spatiotemporal | | | |
Spatial Big Data | | | |
Common Relational Data Storage File Structures
Relational Data Storage Examples
5
B+ Tree
Search tree
Spatial Data Storage Examples
6
Spatial filling curve
Index | Key | Hilbert |
0 | A | 3 |
1 | D | 10 |
2 | E | 16 |
3 | H | 25 |
4 | G | 39 |
5 | F | 44 |
6 | C | 50 |
7 | B | 57 |
R-tree
Spatial Network Storage Example
7
Spatial Network Storage Example (cont’d)
Weekly Topic: Spatial Data Storage
9
Genre | File Structure | Indexing | Compression |
Relational | Heap, Ordered, Hashed | B-tree, B+-tree Hashing | JPEG, MP3 |
Spatial | Space filling curve | R-tree, R+/R* trees | |
Spatial Network | CCAM �Min-cut of edges | | Trajic |
Spatiotemporal | Snapshot, Lagrangian, Min-cut of hyper-edges | TPRtree, MVRtree, pyramid, … | REST: A Reference based framework for ST trajectory compression (G5 - Spring 2022) |
Spatial Big Data | Distributed models | SpatialHadoop (R-tree on Hadoop), GPU … | |
Learning Objectives
10
Recall Critical Reading Questions
11
SSTN-G∀S Problem Statement
Input:
• A spatio-temporal network S
• A set of spatio-temporal operations O
e.g., Get_all_Successors(G∀S )
Output:
• Data Partitioning of S, across data pages
Objective:
• Minimize data page access for operations in O
Constraints:
• S is too large for storage in main memory.
• Temporal-edge attribute information needs to be preserved.
12
Significance
13
Major Contributions
Key Concepts
Key Concepts
access data in memory
Search disk data pages
yes
no
node i
Load relevant disk page in memory
In-memory data page?
Validation Methodology
17
Assumptions
18
Revisions
19
Literature Review : Laundry list approach
20
Literature Review: Decision Table Approach
21
Approach | Connectivity-based grouping | Non-Orthogonal | All successors |
Logitudinal | ✔ | | |
Snapshot | ✔ | | |
Min-cut of edges | ✔ | ✔ | |
Min-Cut of hyperedges | ✔ | ✔ | ✔ |
Literature Review: Decision Tree Approach
min-cut of edges = LCP-G1S: Lagrangian-Connectivity Partitioning For LGetOneSuccessors()
Min-cut of hyperedges = LCP-G∀S: Lagrangian-Connectivity Partitioning For LGetAllSuccessors()
related work
preliminary
work
focus of
the paper
Related Work and Limitations (Revised)
LCP-G1S: Lagrangian-Connectivity Partitioning For LGetOneSuccessors()
LCP-G∀S: Lagrangian-Connectivity Partitioning For LGetAllSuccessors()
related work
preliminary work
focus of the paper
All successors Methods
Non-Orthogonal Methods
SnapShot,�Longitudinal
Min-cuts�(LCP-G1S)
Yes
Yes
No
No
Quiz
Q? Which literature review method best articulates novelty of proposed approach?
24
Quiz
25
1. What is one of the main motivations cited for researching storage methods for spatio-temporal networks?�(a) Entertainment applications like games.�(b) Scientific visualization.�(c) Societal applications like transportation systems.�(d) Geographic information systems��2. What challenge in handling Spatio-Temporal Network (STN) datasets is primarily addressed in the paper?�(a) Reducing the computational complexity of data analysis�(b) Minimizing disk I/O costs during STN operations�(c) Increasing the accuracy of real-time data tracking�(d) Enhancing the security of data transmission in STNs��3. What is one challenge identified for future work?�(a) Indexing techniques for STNs.�(b) Parallelization of LCP algorithms.�(c) Extending to semantically rich STNs.�(d) Continuous STN data models.
Learning Objectives
26
Societal Importance
Huge value of trajectory data
10 million cars, 1 point/second = 7 PB/year
“Two Billion Cars: Driving Toward Sustainability” Daniel Sperling, Deborah Gordon
Problem Statement
Storage space for T
Storage space for T’
Contributions
Key concepts
ti = (loni, lati, tsi)
Key concepts�Prediction of Next Points from Previous ones
31
Example Comparing Trajic with Delta
Point(n) | X(n) | Y(n) |
P(0) | 100001 | 200002 |
P(1) | 101001 | 201004 |
P(2) | 102001 | 302007 |
P(3) | 103001 | 103011 |
Point(n) | Delta X | Delta Y |
P(1) | 1000 | 1002 |
P(2) | 1000 | 1003 |
P(3) | 1000 | 1004 |
Point(n) | X(n) | Y(n) |
P(0) | 100001 | 200002 |
Point(n) | PX(n) error | PY(n) error |
P1 | 0 | 2 |
P2 | 0 | 3 |
P3 | 0 | 4 |
Point(n) | X | Y |
P0 | 100001 | 200002 |
Delta
PX(n) = X(n-1) + 1000
PY(n) = Y(n-1) + 1000
Trajectory
Trajic
Candidate | Large Numbers (8 bytes) | Medium Numbers (4 bytes) | Small Numbers (2 bytes) | Total Bytes |
Trajectory | 8 | 0 | 0 | 8*8 = 64 |
Delta | 2 | 6 | 0 | 2*8 + 6*4 = 40 |
Trajic | 2 | 0 | 6 | 2*8 + 6*2 = 28 |
Key concepts: System Architecture
33
Key concepts
34
Comparison with Douglas Peucker Algorithm
Critical Reading Exercise
Note: Computational complexity of Douglas Peucker line simplification method is N2 .
Literature has N*log(N) complexity for this problem.
Q? How can this paper provide O(N) algorithm? What gives?
Q? Is proposed algorithm correct and complete?
Will it find find a solution if there exists one meeting the error bound ?
Criteria | Delta | Trajic | Douglas Peucker | Chan 2012 | Hersberger 1992 |
On-line | yes | yes | no | no | |
Preserve number of points | yes | yes | no | no | |
Time Complexity | O(N) | O(N) | O(N^2) | O(N^2) | O(N log(N) |
Compression Quality | | Better than Delta | Better if many points dropped | optimal | |
Validation methodology
Critical Reading: Comparing Claims with Evidence
Is this supported by evidence in paper?
How does Douglas Peucker etc. fit here?
Assumptions
Revisions
Quiz
41
3. What was the primary motivation behind developing the Trajic compression system ?�(a) To create a more efficient GPS tracking system.�(b) To address limitations in existing trajectory compression methods, particularly regarding small error margins.�(c) To reduce the cost of data storage in large-scale spatial databases.�(d) To improve the accuracy of trajectory data analysis.�
4. What is a novel aspect of the Trajic compression system compared to previous trajectory compression methods ?�(a) It introduces an entirely new trajectory data format.�(b) It employs delta compression with a novel residual encoding scheme.�(c) It uses machine learning algorithms for prediction. �(d) It focuses solely on real-time trajectory data compression. ��5. What assumption does the Trajic system challenge or modify compared to traditional trajectory compression methods ? �(a) It assumes trajectory data is always linear and predictable.�(b) It assumes trajectory data is highly irregular and cannot be compressed efficiently.�(c) It challenges the assumption that high compression ratios require large error margins.�(d) It assumes all trajectory data must be stored in a lossless format.
Learning Objectives
42
Outline
Reflection on Literature Review
Literature review (gps trajectory compression)
Literature Review: GPS trajectory synonyms, e.g., curve
46
Resources for Literature Search
A Simple Decision Table and Tree
48
| Delta | Proposed (Trajic) | Line Generalization | ||
Criteria | Douglas Peucker | Chan 2012 | Hersberger 1992 | ||
On-line? | yes | yes | no | no | |
Preserve #points? | yes | yes | no | no | |
Time Complexity | O(N) | O(N) | O(N^2) | O(N^2) | O(N log(N) |
Compression Quality | | Better than Delta | | optimal | |
On-line O(N) ?
Delta
Line Generalization
Prediction
N
N
Y
Y
Proposed (Trajic)
Optimality?
Douglas Peucker
N
Y
Solution Quality: Chan 2012;
Compute Cost: Hershberger 1992
Section 2
Cited Literature in the Trajic Paper (pp. 14)
Summarizing Cited Literature: �Laundry List to Decision Table
51
Family | Support off road data | Disc- ard points | | Discard points | Complexity | Temporal | Predictor |
Line generalization | Yes | Yes | Douglas Peucker [6] Hershberger [8] Openning window [11] Sync. Euc. Distance [18] | Yes Yes Yes Yes | O(N2) O(N logN) O(N2) | No No No Yes | |
| | Yes | SQUISH [14] | | O(N) | | |
Delta compression | Yes | No | TrajsStore [5] | No | O(N) | | Constant |
Other | No | No | Use road network [2,3] | | | | |
| Yes | | 1-pass sampling [7] | | | | Linear |
Proposed (Trajic) | Yes | | | No | O(N) | | Multiple predictor |
Exercise: Create a decision tree using the following table
Decision Table to Decision Tree
52
Need road map
Line generalization
e.g., Douglas Peucker,
Hershburger, opening window
Map matching
One pass O(n)?
Y
Y
N
N
Discard data points
SQUISH
Predictor
Trajstore
1-pass sampling
Proposed Trajic
Y
N
constant
linear
richer
Best Practice: Literature Review
Literature Survey
Literature Survey (by author)
Literature Survey (by conference / journal)
Literature Survey (by topic)
Literature Survey Tools
Summary of Literature
Exercise 1
Figure 2: Connectivity Based STN
Data Grouping Methods
Exercise 2
Exercise 3
This task aims to create a richer enumeration space containing all simple paths (i.e. paths that do not have a cycle or repetitive edges).
Our approach entails enumerating all simple paths and then evaluating them. Current approaches for enumerating all simple paths in a graph [142]–[150] can be roughly divided into three categories: (i) approaches that use the power of adjacency matrices [142]–[144], (ii) approaches that use depth/breadth first search (DFS/BFS) with backtracking [145]–[150] and (iii) approaches that use cut vertices (i.e. articulation points) and bi-connected components [150]. Power of adjacency-matrix based methods have high cost of matrix multiplications and equal edge weight assumptions, DFS/BFS with backtracking methods consider only a single node pair on each iteration, and the methods that use cut vertices assume there are enough cut vertices to reduce the enumeration cost. However, cut vertices are rare in transportation networks, e.g. roadmaps.
There are also some approaches dealing with multi-path on wireless network [151, 152). However, they aim at a different problem of delivering packets on multiple known paths to avoid excessive delay instead on paths enumeration. Another work which measures the betweenness centrality of a social network using random walks [153]. It uses random walk to simulate a rich set of possible paths beyond shortest path. However, it does not guarantee generating the complete set of simple path.
Pre-computing and network partitioning is popular in speeding up routing algorithms. Given a partitioned network, “edge-flag” approaches [154] store the information whether each edge whether it is in any shortest path to each fragment. When computing shortest path later, this information is used to reduce the search space. “Network contraction” approaches [155, 166] order and replace nodes by adding “shortcut edges” which represent the shortest distance between their end nodes. “Hierarchical” approaches [157,158] pre-compute and store shortest path inside each fragment and between each pair of boundary nodes. Later, the shortest path between nodes in two fragments is the path that minimizes Cost(S,Bs)+Cost(Bs,Bd)+Cost(Bd,D), where Bs and Bd are the boundary nodes. However, the speedup of all of these approaches come from eliminating information of non-shortest paths and thus substantially reduce the search space, which make them inapplicable to the all-simple-path enumeration problem.
Exploring simple-path based linear hotspots
[141] N. H. T. S. Administration, “Fatality analysis reporting system (fars) encyclopedia.” [Online]. Available: ftp://ftp.nhtsa.dot.gov/fars/.
[142] J. Ponstein, “Self-Avoiding Paths and the Adjacency Matrix of a Graph,” SIAM J. Appl. Math., vol. 14, no. 3, pp. 600–609, 1966.
[143] G. Danielson, “On finding the simple paths and circuits in a graph,” IEEE Trans. Circuit Theory, vol. 15, no. 3, pp. 294–295, 1968.
[144] F. Rubin, “Enumerating All Simple Paths in a Graph,” IEEE Trans. Circuits Syst., vol. 25, no. 8, pp. 641–642, 1978.
[145] J. C. Tiernan, “An efficient search algorithm to find the elementary circuits of a graph,” Commun. ACM, vol. 13, no. 12, pp. 722–726, 1970.
[146] D. B. Johnson, “Finding all the elementary circuits of a directed graph,” SIAM J. Comput., vol. 4, no. 1, pp. 77–84, 1975.
[147] J. L. Szwarcfiter and P. E. Lauer, “A search strategy for the elementary cycles of a directed graph,” Bit, vol. 16, no. 2, pp. 192–204, 1976.
[148] R. C. Read and R. E. Tarjan, “Bounds on backtrack algorithms for listing cycles, paths, and spanning trees,” Networks, vol. 5, no. 3, pp. 237–252, 1975.
[149] B. Banerjee and B. Chandrasekaran, “On some properties of the Voronoi diagram for planning multiple paths in free space,” Citeseer, vol. 3079, no. January, pp. 1–17, 2007.
[150] R. Ferreira, R. Grossi, and A. Marino, “Optimal Listing of Cycles and st-Paths in Undirected Graphs,” arXiv Prepr. arXiv …, pp. 1884–1896, 2012.
[151] A. Tsirigos and Z. J. Haas, “Multipath routing in the presence of frequent topological changes,” IEEE Commun. Mag., vol. 39, no. 11, pp. 132–138, 2001.
[152] A. Tsirigos and Z. J. Haas, “Analysis of multipath Routing-Part I: the effect on the packet delivery ratio,” IEEE Trans. Wirel. Commun., vol. 3, no. 1, pp. 138–146, 2004.
[153] M. E. J. Newman, “A measure of betweenness centrality based on random walks,” Soc. Networks, vol. 27, no. 1, pp. 39–54, 2005. Elsevier.
[154] U. Lauther, “An extremely fast, exact algorithm for finding shortest paths in static networks with geographical background,” Geoinf. und Mobilit{ä}t-von der Forsch. zur Prakt. Anwendung, vol. 22, pp. 219–230, 2004.
[155] D. Delling, A. V Goldberg, T. Pajor, and R. F. Werneck, “Customizable route planning,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2011, pp. 376–387.
[156] D. Delling and R. F. Werneck, “Faster customization of road networks,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2013, pp. 30–42.
[157] Jing, N., Huang, Y.W. and Rundensteiner, E.A., 1998. Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 10(3), pp.409-432.
[158] Shekhar, S., Fetterer, A. and Goyal, B., 1997, July. Materialization trade-offs in hierarchical shortest path algorithms. In International Symposium on Spatial Databases (pp. 94-111). Springer, Berlin, Heidelberg.
References
Solution
[84] J. Ponstein, “Self-Avoiding Paths and the Adjacency Matrix of a Graph,” SIAM J. Appl. Math., vol. 14, no. 3, pp. 600–609, 1966.
[85] G. Danielson, “On finding the simple paths and circuits in a graph,” IEEE Trans. Circuit Theory, vol. 15, no. 3, pp. 294–295, 1968.
[86] F. Rubin, “Enumerating All Simple Paths in a Graph,” IEEE Trans. Circuits Syst., vol. 25, no. 8, pp. 641–642, 1978.
[87] J. C. Tiernan, “An efficient search algorithm to find the elementary circuits of a graph,” Commun. ACM, vol. 13, no. 12, pp. 722–726, 1970.
[88] D. B. Johnson, “Finding all the elementary circuits of a directed graph,” SIAM J. Comput., vol. 4, no. 1, pp. 77–84, 1975.
[89] J. L. Szwarcfiter and P. E. Lauer, “A search strategy for the elementary cycles of a directed graph,” Bit Numer. Math. Springer, vol. 16, no. 2, pp. 192–204, 1976.
[90] R. C. Read and R. E. Tarjan, “Bounds on backtrack algorithms for listing cycles, paths, and spanning trees,” Networks, Wiley Period. Inc, vol. 5, no. 3, pp. 237–252, 1975.
[91] B. Banerjee and B. Chandrasekaran, “On some properties of the Voronoi diagram for planning multiple paths in free space,” Elsevier, vol. 3079, no. January, pp. 1–17, 2007.
[92] G. Birmele, Etienne and Ferreira, Rui and Grossi, Roberto and Marino, Andrea and Pisanti, Nadia and Rizzi, Romeo and Sacomoto, “Optimal Listing of Cycles and st-Paths in Undirected Graphs,” in Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, SIAM Press, 2013, pp. 1884–1896.
[93] A. Tsirigos and Z. J. Haas, “Multipath routing in the presence of frequent topological changes,” IEEE Commun. Mag., vol. 39, no. 11, pp. 132–138, 2001.
[94] A. Tsirigos and Z. J. Haas, “Analysis of multipath Routing-Part I: the effect on the packet delivery ratio,” IEEE Trans. Wirel. Commun., vol. 3, no. 1, pp. 138–146, 2004.
[95] M. E. J. Newman, “A measure of betweenness centrality based on random walks,” Soc. Networks, vol. 27, no. 1, pp. 39–54, 2005. Elsevier.
[96] U. Lauther, “An extremely fast, exact algorithm for finding shortest paths in static networks with geographical background,” Geoinf. und Mobilit{ä}t-von der Forsch. zur Prakt. Anwendung, vol. 22, pp. 219–230, 2004.
[97] D. Delling, A. V Goldberg, T. Pajor, and R. F. Werneck, “Customizable route planning,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2011, pp. 376–387.
[98] D. Delling and R. F. Werneck, “Faster customization of road networks,” in International Symposium on Experimental Algorithms, Springer, Berlin, Heidelberg, 2013, pp. 30–42.
[99] F. Schulz, D. Wagner, and C. Zaroliagis, “Using multi-level graphs for timetable information in railway systems,” Algorithm Eng. Exp. Springer, pp. 43–59, 2002.
[100] M. E. J. Newman, “Spectral methods for community detection and graph partitioning,” Phys. Rev. E, American Physcal Society, vol. 88, no. 4, p. 42822, 2013.
References