Welcome to GEOGM0068:
Geographic Information Retrieval and Integration
Rui Zhu
rui.zhu@bristol.ac.uk
GEOGM0068 - TB2 2024/25
Lecture 05
Rui Zhu
rui.zhu@bristol.ac.uk
GEOGM0068 - TB2 2024/25
Assessment
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Review: What is Georeferencing
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Review: Geoparsing Pipeline
Tokenization
To split text into words/phrases or document into sentences
Tagging
To assign Part-of-Speech (POS)
Lookup
To look up lists of known locations (i.e., gazetteers), organizations, people, etc.
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Lookup Resource - Gazetteers
Global coverage; traditional gazetteer
Global coverage; more culturally and historically related gazetteer
Global coverage; not only a gazetteer; in graph format
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Lookup Resource - Gazetteers
Other potential resources:
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Pros and Cons of Simple Lookup
Pros
Cons
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Variation in Place Names
Example (Polysemy - same name refers to various things):
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Variation in Place Names
Example (Synonym - various names refer to the same place):
Zhu, R., Janowicz, K., Yan, B., & Hu, Y. (2016). Which kobani? a case study on the role of spatial statistics and semantics for coreference resolution across gazetteers. In International conference on GIScience short paper proceedings (Vol. 1, No. 1).
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Disambiguation
Types of Evidence:
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Context
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Disambiguation
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Context
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Selecting the Best Candidate
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Similarity-based Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Similarity-based Approach
2. Compute the distance (dissimilarity) of each candidate place name’s vector to the target candidate place name’s vector.
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Similarity-based Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Similarity-based Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Probabilistic Approach
How likely “Washington” (target place name) refers to “Washington D.C.” (candidate place name) given the observed context?
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Machine Learning Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Machine Learning Approach
Key idea: to learn a low-dimensional vector to represent the term/place name, so that the distance between semantically relevant terms/place names is small.
Popular methods:
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Other Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Other Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Other Approach
Hu, Y., Mai, G., Cundy, C., Choi, K., Lao, N., Liu, W., ... & Joseph, K. (2023). Geo-knowledge-guided GPT models improve the extraction of location descriptions from disaster-related social media messages. International Journal of Geographical Information Science, 37(11), 2289-2318.
Hu, Xuke, et al. "Toponym resolution leveraging lightweight and open-source large language models and geo-knowledge." International Journal of Geographical Information Science (2024): 1-28.
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Prompt Engineering
Mai, G., Huang, W., Sun, J., Song, S., Mishra, D., Liu, N., ... & Lao, N. (2023). On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798.
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Summary of Geoparsing
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Evaluation Metrics
Recommended reading: Wang, J., & Hu, Y. (2019, November). Are we there yet? Evaluating state-of-the-art neural network based geoparsers using EUPEG as a benchmarking platform. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities (pp. 1-6).
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Evaluation Metrics
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Evaluation Metrics
Recommended reading: Liu, Z., Janowicz, K., Cai, L., Zhu, R., Mai, G., & Shi, M. (2022). Geoparsing: Solved or Biased? An Evaluation of Geographic Biases in Geoparsing. AGILE: GIScience Series, 3, 1-13.
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Evaluation Metrics
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Recall: What is Georeferencing
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Geocoding
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Ambiguity in Geocoding
Example:
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Geocoding Approaches Overview
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Knowledge-based Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Knowledge-based Approach Example
Works well in this case
Failed
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Again, Context is the Key!
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Data-driven or Supervised Approach
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Map-based Approach
Waldo Tobler
1931 - 2018
Recommended Readings: Tobler, W. (2004). On the first law of geography: A reply. Annals of the Association of American Geographers, 94(2), 304-310.
Zhu, R., Janowicz, K., & Mai, G. (2019). Making direction a first‐class citizen of Tobler's first law of geography. Transactions in GIS, 23(3), 398-416.
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Map-based Approach Example
1. Compute the distance between these contextual places to the candidate places, 2. average the distances, and 3. select the candidate with the minimal averaged distance as the geocoded result
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
More Challenges
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Summary of Geocoding
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25
Summary of Georeferencing
GEOGM0068 - TB2 2024/25
GEOGM0068 - TB2 2024/25