Our project aims to build a distributed data store specifically designed to hold geo-related data accessed in a similar fashion to a wiki.

Scott Shawcroft - scott.shawcroft@gmail.com
Jason Kivlighn - jkivlighn@gmail.com

    We hope to provide a distributed system for storing map data from the OpenStreetMap project.  Briefly, this project is an aggregation of public geodata sources such as Tiger/Line and user generated geodata based on GPS tracks.  Currently, they host all of their data on a single machine in a MySQL database.  On their wiki they state they would like to move to a distributed system in order to better handle load.  The load is a combination of individuals reading and writing data into the database and programs pulling geodata in bulk in order to render tiles.  We hope to provide a new system for them which improves the response time for individual users, improves the wiki nature of mapping and allows for scalability.

    One of the most challenging aspects will be designing a system which handles load efficiently.  Partitioning the dataset by location seems like an obvious optimization but could fail if all the traffic to the database is directed to a particular area.  For instance, if a mapping party has a number of people working on a particular location then a simple geo partitioning will still direct all the traffic to a single server.  However, if the user queries vary then the distribution of data will even the load across the servers.  Furthermore, distributing the data amongst the servers based on geography will distributed the load more evenly during bulk data transfers.  These are just a few of the solutions we are considering in designing our system.

    Our implementation will be relatively straightforward.  The OpenStreetMap wiki documents an API for access to the database which we will implement.  This way our solution can be a drop in replacement for their current system.  Besides the actual design of the system we will be challenged by creating a maintainable system and the tools to support it.  For example, we will need to write the system and document it in such a way that others more involved in the development of OpenStreetMap can maintain the system.  Some of the tools that would also be needed are ones to import bulk data and transition data from the current system to ours.

    Lastly, we want our system to be scalable.  However, this is not quite scalable in the sense of Google.  We want our system to scale well from a single machine to a few machines.  Not necessarily from 100 to 1000 machines.  This is due to the limited resources of the OpenStreetMap community and their desire to maintain a minimal number of machines.  We think, however, that we can use Amazon's Web Services to test the scalability of our system.  Development can occur locally at our house with the few heterogenous machines we have there but to test further scalability we will probably leverage AWS.

    So, we will work to implement a distributed geo data store and the supporting tools and documentation.  To achieve this we will work with the existing OpenStreetMap community and leverage AWS as a testing environment.

Resources:
    OpenStreetMap
    OSM Desire to Distribute Load
    OSM API 0.5 (Current)
    OSM API 0.6 (Under Development)
    OSM Design Structure
    OSM Current Database Setup
OSM Desire to Distribute the Load