OpenStreetMap Foundation
Licensing Working Group
An analysis of Share-Alike Virality for Geocoding and Reverse-Coding
WORK IN PROGRESS!
This is a concept paper being prepared by the OpenStreetMap Foundation’s License Working Group prior to community consultation. The ideas expressed are just that, ideas. They do not represent Foundation policy.
Capitalised words follow the legal convention of being explicitly defined within this document for clarity.
Geocoding or Reverse-Geocoding returns a Record, i.e. a set of information. If that Record comes from the use of OpenStreetMap data, then it comes from data published under a Share-Alike license, specifically Open Database License 1.0 or ODbL. If you now use that Record to enhance information that you already have, is that information now “virally touched”, i.e. if it is published in any form, you are now obliged to publish that information under the same Share-Alike license?
Currently we cannot answer that question except to be safe and say, “probably yes, you have to publish it”. The License Working Group is working on providing a more useful answer. So far, we believe the route to get there involves coming up with
We are of the view that allowing geocoding without triggering Share-Alike for most uses is beneficial to OpenStreetMap and open geo-spatial data because:
However, we can only do this if:
Here is the current status of our work in defining the tightest possible clear definition of what Geocoding is that encompasses real world use cases. We favour Alternative 2 provided it flies in the real world.
Geocoding sensu stricto:
Geocoding is the practice of providing ...
Alternative 1: a postal address, a building name or other unique information such as a telephone number ...
Alternative 2: a text string (the inexact form of an address) ...
[Only address information may be provided. Providing other information such as type of business or amenity is a filtered database extraction rather than geocoding]
and returning an exact marked up address and it's location, (such as a lat/lon pair), or a polygon that indicates the area the place the text string corresponds to. This result is a geocoded result. Generally one location is returned and this is the Record that you will use. Multiple results may be returned in some situations, mainly due to ambiguity. In this case, only one may be selected as a Record and the others thrown away ... if you do not, then you are making a general database extraction and any special Share-Alike waivers do not apply.
Reverse-geocoding: Providing a lat/lon pair and deriving a postal address.
Bulk geocoding: Extracting more than an Insubstantial amount either in one go or over a period of time.
The Record: Whether geocoding or reverse geocoding, each resulting lat/lon or address is almost certainly going to be added to another set of information. We will call this the Record.
You: A person or organisation performing geocoding.
This analyzes some general philosophy, particular what is the point of Share Alike. Armed with that we can go on to look at where it must be applied and where it could potentially be waived.
Proponents of Geocoding without Share Alike sometimes express frustration that Share Alike is just something in their way. It is therefore worth taking a small step back to consider why it is there.
Share-Alike is one of OpenStreetMap’s tools to expand the pool of open and free data available to the public, i.e. you. If you or your clients/customers are improving the map data, you should be improving the map data in OpenStreetMap, rather than improving the map data and keeping it for yourself.
If you are a commercial user of OpenStreetMap data, this can also be very useful to you in other ways. You leverage your own mapping efforts by simply adding to the now enormous OpenStreetMap pool secure in the knowledge that your competitor’s cannot take that, do a small value add and then sell a “superior” competing proprietary product.
So, Share Alike is important and as long as OpenStreetMap has a Share Alike license, then in the grand scheme of things, it is far more important than Geocoding! Sorry, but there is is.
But as Share-Alike is a tool, there is one area to look at and that is whether there should be an edge to what is “touched” in terms of utility and relevance. Other information combined with a Record may or may not be of utility to OSM specifically or to the public generally.
One area the LWG is generally pursuing is whether we can define such edges ... Edge Principles. We have to be careful though. OpenStreetMap utility may be served by stating "restaurant reviews do not improve the map data but wheel chair access data does". On the other hand it may not a valid criteria as a) things may not be of interest today but may be tomorrow, b) the point of share-alike is to share with everyone on the planet, not just the OpenStreetMap project.
That caveat aside, let us look next more closely at what is needed to define edge principles.
Option 1. Claim that everything is touched. If that path is taken, game over, Geocoding always virally touches the information it geocodes.
Option 2. Claim that information associated with a Record, (often proprietary business information or personal information such as a patient record) is not virally touched by geocoding against OSM ODbL data
To do Option 2, a distinction needs to be demonstrated. This distinction needs to be:
At the moment, we feel this does not exist. Anything that is tagged with a lat/lon becomes "geographic", so that cannot be used as a starting point. Further, reverse geocoding may extract from the OSM database lots of other information that goes well beyond the basic scope of lat/lon and an address or building name.
The distinction definitely needs to work with pure (reverse) geocoding:
Pure Geocoding: You provide ONLY a postal address, a building name or other unique information such as a telephone number and get back ONLY a lat/lon pair.
Pure Reverse-geocoding: You provide ONLY a lat/lon pair and get back ONLY a postal address or a building name or one other unique information such as a telephone number.
It may also need to work with more complicated cases where you are gaining other information by adding filters to your request or asking for more than basic information to be returned.
Does ODbL allow Edge Principles?
This is the focus of another document that will be merged here.
Another broad issue is the extent to which contributors in any given open data area should be allowed to define what reasonable for that given area. Creative Commons have established it as a principle for at least how to attribute in CC4 and it is worth exploring further.
This is probably the easiest to apply since it is easy to define and directly coupable with the ODbL.The ODbL, but not CC-BY-SA, allows insubstantial extractions to be made without triggering share-alike.
As an end user, it is always difficult to judge what is insubstantial and what is substantial. The OSM community have therefore created Community Guidelines, which are in turn accepted by the OSMF as consolidated publisher. http://wiki.openstreetmap.org/wiki/Open_Data_License/Substantial_-_Guideline
There is no specific text relevant to bulk (reverse)geocoding. We therefore need to add about one sentence to it.
As yet, there is no guideline for geocoding. More input required.
This concept is that if it is something that OpenStreetMap contributors might by consensus put on the OpenStreetMap database, then it is In Scope and Share-Alike should apply. If not then it is Out of Scope and Share-Alike should not apply. If acceptable to OpenStreetMap contributors and then wider open IP community, then it may be very practical as it would leave out things like:
TODO: We have come up with prototype definition of what OSM collects when considering “dynamic data” ... insert it here for discussion.
A concept has been suggested which we are now exploring further both for basic sanity and possible flaws as well as for how it benefits the promotion and expansion of open geo-data. It needs to be fair and practical to enforce. As with everything in OpenStreetMap, this should be thrown open for community discussion and may radically change as a result.
Loosely, the concept could be called the Like For Like principle. Whatever is used in the (reverse)geocoding look-up is virally touched, but nothing else.
"Used" would include
Broadly, the result we might want is that if you geocode a bunch of addresses, then:
Similarly, if are looking at pubs only and you extract location, the fact that it is a pub and its name, then we want your pub location and names back, but we might not want your pub reviews or other data that you have exclusively collected and NOT augmented from OSM.
1) You have a set of geographic objects with tags A, B, C and ADDRESS. ADDRESS is some parameter that can be used for geocoding, most likely an address or building name, but could be something else unique such as a telephone number. They may have lat/lons associated with them, or they may not. Or you might add them later if you cannot find them in OSM.
2) OSM has a set of geographic objects with tags C, D, E and ADDRESS. These always have a lat/lon. Some of the objects may also have the tags A and B.
IF you SEND ADDRESS to OSM or to any service or program using a copy OSM geodata, then you need to share all your ADDRESS.
If you EXTRACT a set of lat/lons from OSM data because they match your tag ADDRESS, then you are obliged to share your list of ADDRESS and any lat/lons you subsequently map them to but not A, B. If you keep any instance of OSM C in the set, then you need to share your own C. If you immediately discard all OSM C, then you do not need to share your own C. [So what about OSM D and E ... can they keep and use that??]
If you use a FILTER or REVERSE FILTER, then you need to share whatever you used to filter. So, if you ask for a list of all pubs with wheel chair access ... Then you need to share your list of pubs, together with lat/lon, and your information about wheel chair access. If you ask for a list of all pubs without wheel chair access, you still need to a share your list of pubs and your information about wheel chair access. You do not need to share other information you have about pubs. [So does that mean you need to share amenity=pub, wheelchairaccess=*, but not name=* ??]
This can simply be stated as:
“If you perform geocoding or reverse geocoding and throw away the OSM data after providing a service, then you don't need to share anything. “
Before recommending, the LWG needs to consider two questions: