1 of 162

Geocoding

without Programming

Patty Frontiera

pattyf@berkeley.edu

March 18, 2016

2 of 162

3 of 162

Workshop Overview

  • What is Geocoding?
  • Why do it?
  • Options
  • Considerations
  • Examples
  • Geocoding Exercises (sprinkled throughout)

4 of 162

What is Geocoding?

Obtain geographic coordinates for a place name, address or code

city, building, mountain, landmark,

street address, intersection,

hazardous waste sites,

crime or other event location,

zip code, etc.

5 of 162

Geocoding

Get geographic coordinates for a

place name, address or code

Sather Gate, UC, Berkeley

37.87014, -122.2595

GEOCODING SOFTWARE

6 of 162

Sather gate

7 of 162

8 of 162

Try Geocoding

Web Maps use geocoding sofware when you do a place name, code or address search.

Enter a place name in the search bar at www.openstreetmap.org

Then, copy and paste the returned coordinates in maps.google.com, and see how well the geocoder works.

9 of 162

Reverse Geocoding

Get the place name, code or address for geographic coordinates

10 of 162

Reference Database

Geocoding requires a reference database against which place names, codes and addresses are matched.

Input data are compared to the reference database and best matches are retrieved.

Reference database includes:

  • Text information
  • Geospatial data - points, lines or polygons

11 of 162

Geographic Coordinates

Latitude:

+- 90 degrees

How far N/S of Equator

Longitude

+-180 degrees

E/W of Prime Meridian

DMS vs Decimal Degrees

37° 52' 12"N, 122° 15' 36" W

37.870145, -122.25952

12 of 162

Geographic Coordinate Reference Systems

Lines of latitude and longitude are part of a geographic coordinate reference system (GCS).

There are several widely used GCSs including

  • WGS84: World Geodetic System of 1984
  • NAD83: North American Datum of 1983
    • You need to know which one your data are using!

13 of 162

Map Projections / Projected Coord Systems

Map = flat representation of the non-flat Earth

Map projection = mathematical transformation from 3D surface to 2D plane.

14 of 162

Coordinate System & Map Projections

A working knowledge of geographic coordinate systems and map projections is very important.

Good online references

  • ESRI document Understanding Map Projections.
  • http://kartoweb.itc.nl/geometrics/index.html

15 of 162

Geographic Coordinates

Try it - enter the decimal degree coordinates in Google Maps search bar.

What order are the coordinates in?

  • How do you know?
  • What happens if you change the order of the coordinates?

How many decimals can you delete and still be at same location?

37.870058, -122.257944

16 of 162

Why Geocode?

  • Display locations on a map

  • Link those locations to other data

  • Perform Spatial analysis
    • Calculate distance, direction, area, etc.
    • Identify patterns & relationships
      • clusters, outliers, neighbors

17 of 162

18 of 162

Web Scraping

http://www.co.contra-costa.ca.us/DocumentCenter/Home/View/2462

19 of 162

http://www.meganslaw.ca.gov/

20 of 162

How to Geocode

The Street Address Geocoding Process

21 of 162

Addresses Geocoding

Input street addresses

Compare to reference database

Parse addresses

to primary components

Output

coordinates & metadata for best match

22 of 162

Addresses Geocoding

Input street addresses

Compare to reference database

Parse addresses

to primary components

Output

coordinates & metadata for best match

7305 Edgewater Dr, Oakland, CA, 94621

23 of 162

Addresses Geocoding

Input street addresses

Compare to reference database

Parse addresses

to primary components

Output

coordinates & metadata for best match

7305 Edgewater Dr, Oakland, CA, 94621

7305| Edgewater Dr|Oakland | CA | 94621

24 of 162

Addresses Geocoding

Input street addresses

Compare to reference database

Parse addresses

to primary components

Output

coordinates & metadata for best match

7305 Edgewater Dr, Oakland, CA, 94621

7305| Edgewater Dr|Oakland | CA | 94621

25 of 162

Addresses Geocoding

Input street addresses

Compare to reference database

Parse addresses

to primary components

Output

coordinates & metadata for best match

7305 Edgewater Dr, Oakland, CA, 94621

7305| Edgewater Dr|Oakland | CA | 94621

37.7446,-122.2063

26 of 162

7315 Edgewater Dr, Oakland, CA, 94621

Compare to reference database

Parse addresses

to primary components

Output Coordinates

& metadata

7305 Edgewater Dr, Oakland, CA, 94621

7305| Edgewater Dr|Oakland | CA | 94621

Prepare Input Addresses

Review output & Repeat

You

Geocoding Software

37.7446,-122.2063

27 of 162

Geocoding reference data set

  • The quality of the reference data largely determines quality of output
    • Typically street centerline data, sometimes parcel or point

7305

real location

28 of 162

Try It - Geocode this address

2700 Bancroft Way, Berkeley, CA 94704

Google Maps - maps.google.com

OpenStreetMaps -www.openstreetmap.org

Try making spelling mistakes or leaving out parts to see how well the geocoder can deal with less than perfect data.

29 of 162

Geocoding & You

Input street addresses

Compare to reference database

Parse addresses

Output

coordinates & metadata for best match

Clean addresses

30 of 162

Geocoding & You

Input street addresses

Compare to reference database

Parse addresses

Output

coordinates & metadata for best match

Clean addresses

7305 Edgewater Drive # D, Oakland, 94621

7305 Edgewater Drive #D, Oakland, 94621

7305 Edgewater Drive, Oakland, CA, 94621

remove unnecessary components

add necessary components

31 of 162

Geocoding & You

Most geocoders will be able to geocode ~ 80% of your addresses with little cleaning and get you within a block of the actual location.

32 of 162

Comparing output quality

TAMU

OSM

Google

Bing

Yahoo

Here

% Matched

100%

80%

70%

100%

100%

90%

100%

Average ∆ (ft)

24,265

3,439

894

104

166

70

92

Free

Freemium

Source: SmartyStreets.com, 2016

33 of 162

Geocoding & You

If you need high quality output you need to understand

  • what can go wrong
  • what you can improve
  • what you can & should pay for

34 of 162

Geocoding & You

WARNING:

If you are working with restricted use data, your options will be much more limited and the costs of getting high quality output much greater.

35 of 162

Cleaning & Standardizing Addresses

Lots you can do.

Brief review - get slides for reference.

36 of 162

Cleaning & Standardizing Addresses

  • provide all components consistently
  • standardize delimiter (comma or semi-colon, etc)
  • remove duplicates
  • Remove extra spaces and commas
  • Remove odd characters like “/” “l”, “@”
  • Use full names for streets
    • Martin Luther King Jr Way not MLK

37 of 162

Cleaning & Standardizing Addresses

  • Intersection format
    • Corner of Main and Long Ave > Main & Long

  • Numbered streets
    • Fourth > 4th

  • Directional Prefixes & Suffixes
    • North, No, South, etc > N, S, E, W, NW

38 of 162

Intersections

“&” and “AND” are most common for US Streets

39 of 162

Directional Prefixes / Suffixes

Should be in form: N, S, E, W, NW, etc.

No periods, dashes, or full words!

preferred

40 of 162

PO Box

Change the following to PO Box

BX, P O Box, POBOX, PO BOX, P OBOX, POB, PMB, PO Drawer, POST OFFICE DRAWER, PBS Box ZIP

PO Box 123 (456 Main St) > 456 Main St

PO Box 123 or 456 Main St > 456 Main St

41 of 162

http://www.albany.edu/faculty/ttalbot/Geocoding_Lecture_2015.pdf

42 of 162

Data entry errors are hard to find & fix!

  • Typos:
    • Eb Court vs Ebb Court
    • Willow Brook Rd vs. Willow Brook Rd
    • Hihg St for High St

  • Missing components
    • e.g., no city, zip, street number

43 of 162

Preventing Problems

If you are collecting address data via surveys try to create surveys that minimize the likelihood of address input errors.

44 of 162

Problems in Reference Data

Errors

  • Incorrect street ranges
  • Inaccurate or low quality features
  • Inaccurate feature attributes

Other

  • Missing streets
  • Address changes

45 of 162

Oversimplified Street Features

http://www.albany.edu/faculty/ttalbot/Geocoding_Lecture_2015.pdf

26

46 of 162

Geocoding match rates can vary by location type

http://www.albany.edu/faculty/ttalbot/Geocoding_Lecture_2015.pdf

47 of 162

Cleaning & Standardize Addresses

  • Extremely important
  • Lots of work
  • Unlikely to get it perfect

48 of 162

Address Formatting

Prepare data in format required by geocoder

49 of 162

Address Formats

Single field format

Multi field input

  • Are multiple columns required?

  • Header rows?

  • Watch commas as column separators!

50 of 162

Tips for processing lots of addresses

  • Test on small sample of addresses
  • Sort your addresses by state, city, zip
  • Split input addresses into smaller files
  • Assess quality of geocoding output
  • Rinse & repeat - iterative process
  • Consider output format and what’s next

You may need still programming!

51 of 162

Criteria for Selecting a Geocoder

  • Cost
  • Robustness & Sophistication
  • Speed & scalability
  • Geographic and temporal scope
  • Output quality, type & metadata
  • Security - Remote vs. local software
  • Ease of use

52 of 162

53 of 162

Free/Freemium Online Geocoding Services

That do not require programming!

  • U.S. Census Geocoder
  • Texas A&M (TAMU) Geocoder
  • Google Earth Pro
  • MapBox, MapZen, Geocode.io, CartoDB.com

54 of 162

Free Local (On Premise) Geocoders

  • ESRI ArcGIS Desktop free with UC site license

You need a local reference database to be completely offline.

55 of 162

Geocoding without Programming

56 of 162

Google Fusion Tables

When you just want to make a map!

57 of 162

Google Fusion Tables

58 of 162

Google Fusion Tables

Requires location data in one column

  • That means no commas

ID,Store,Address

1,Wah Fay Liquors,2101 8th Ave Oakland CA 94606

2,Vision Liquor,1615 Macarthur Blvd Oakland CA 94602

3,Souza's Liquors,394 12th St Oakland CA 94607

4,Tk Liquors,1500 23th Ave Oakland CA 94606

5,Quadriga Wines Inc,6193 Ridgemont Dr Oakland CA 94619

59 of 162

Google Fusion Tables

You can input location as place name or street address.

60 of 162

61 of 162

62 of 162

63 of 162

64 of 162

65 of 162

66 of 162

67 of 162

Google Fusion Tables

What we like:

  • Free, fast, easy mapping tool
  • International geocoding
  • Supports place name or address geocoding

Not so much:

  • You can’t get the coordinates

68 of 162

Try Geocoding with Google Fusion Tables

Use the sample address data.

Try oak_liq_w_ids.csv file.

  • Make any needed formatting changes
  • Or use the file in the formatted folder

oak_liq_gfusion_format.csv

  • Remember: the location information must be in one column!
  • Remove commas separating address components - why?

69 of 162

US Census Geocoder

http://geocoding.geo.census.gov/

70 of 162

Census Geocoder

Two options - (1) Find Locations & (2) Find Geographies

(2) let’s returns codes to link to census data!

71 of 162

Census Geocoder

Let’s try it

DEMO: http://geocoding.geo.census.gov/

  • Geocode an address you are familiar with
  • Check result in maps.google.org, or similar

72 of 162

https://www.census.gov/geo/maps-data/data/geocoder.html

73 of 162

74 of 162

  • No header row
  • Only ID & address columns

75 of 162

oak_liq_stores.csv

20 Oakland Liquor Store Addresses (subset - grabbed online)

76 of 162

Census Geocoder Output

Your input address

77 of 162

Census Geocoder Output

Geocoder output - note match quality metadata.

78 of 162

Census Geocoder Output

Need to split coordinates into to columns (lon and lat) before you can map.

79 of 162

Census Geocoder Output

Census FIPS codes!

80 of 162

Post-processing in Google Sheets

  • Add two columns, then use split function to populate with longitude and latitude
  • Add header row
  • Download to CSV to open in GIS or Mapping software

81 of 162

View geocoded output in

geojson.io

82 of 162

Map & Analyze output in QGIS, ArcGIS

83 of 162

Census Geocoder - Likes

  • Simple, and easy to use
  • Relatively fast
  • Free & unlimited (but only 1000 at a time)
  • Returns FIPS codes - so you can link to census data
  • Based on and thus lines up with census geographies

84 of 162

Census Geocoder - Not so much

  • Addresses only
  • 1000 address limit (at a time - not in total)
  • Underlying reference database is solid but not great
  • Only for USA
  • Online – not good for restricted use data
  • Need to delete non-address fields and re-add
  • Point data output in single column - need to split

85 of 162

Try Census Geocoder

Geocode sample data oak_liq_w_ids.csv file.

  • You may need to preprocess
  • You can also grab the file in the formatted folder

  • View results in geojson.io
    • You need to postprocess in google sheets to split coordinates

86 of 162

Google Earth Pro Geocoder

87 of 162

88 of 162

Input format - Requires header row

89 of 162

90 of 162

91 of 162

92 of 162

93 of 162

94 of 162

95 of 162

96 of 162

97 of 162

98 of 162

Save output to KML or KMZ

99 of 162

Open Google Earth KML in QGIS or ArcGIS

100 of 162

You can open in geojson.io & save to csv

101 of 162

Now you can process in

QGIS, ArcGIS, R, Python, Stata,

102 of 162

Google Earth Pro - Cities, Global Coverage

cities.csv

ID,CITY,STATE

1,Boston,MA

2,New York,NY

3,Ipswich,MA

4,Paris,France

addresses.csv

ID,STREET,CITY,STATE

1,18 Grove St,Boston,MA

2,727 5th Ave,New York,NY

3,246 High St,Ipswich,MA

4,42 Rue d’Anjou,Paris,France

103 of 162

104 of 162

105 of 162

Geocoding with Google Earth Pro

Like 💝💖💖

  • Fantastic accuracy, global coverage, ease of use, robustness

Not so much

  • Limited to 2,500 locations per day
  • No CSV output
  • No metadata with output - can’t assess quality

106 of 162

Try Geocoding with Google Earth Pro

Geocode sample data oak_liq_w_ids.csv file.

  • You may need to preprocess
  • You can also grab the file in the formatted folder

Install the software if needed or partner with someone.

107 of 162

ArcGIS esri.com

108 of 162

Why ArcGIS?

  • Familiarity
  • Customizable
  • Robust and sophisticated parser & matcher
  • Extensive documentation and support

109 of 162

Why ArcGIS?

  • Great re-matching & post-geocoding functionality
  • No limits on # of addresses with local reference data
  • Add your own local reference data
  • Fast: ~ 1 million addresses in less than an hour

110 of 162

BUT ArcGIS….

  • Runs best on Windows PC,
    • slower on Mac
    • not easy to install

  • Expensive software
    • but campus site license

  • Expensive reference data via ArcGIS Online
    • free alternatives include US Census Tiger Data & ESRI NA Streets data

111 of 162

First 5,000 free - $4,000 for 1,000,000 addresses

ArcGIS Online Geocoder - ESRI credits

https://developers.arcgis.com/en/features/geocoding/

112 of 162

ArcGIS Geocoder with ESRI NA Streets data

  • Local data for geocoding addresses in ArcGIS only
  • You install it locally on your computer or local network
  • UCB folks can get a copy for instruction or research.
  • Data circa 2009
  • Big data set (8 GB)

113 of 162

ArcGIS Business Analyst 2015

2014 Street Data & Geocoding Software

Requires

  • ArcGIS 10.3.1
  • Business Analyst extension
  • Business Analyst 2015 DATA
  • Big data set - 30 GB

114 of 162

Why Use ArcGIS & ESRI Local Streets data

  • You want to use ArcGIS
  • When you have lots (> 2500) of addresses to geocode.
  • You don’t want to pay for ArcGIS Online geocoding
  • You need to work offline
    • e.g., you are working with restricted access data

115 of 162

116 of 162

Geocoding in ArcMap

STEP 1: Load your address data in ArcMap

  • csv, txt, xlsx formats

117 of 162

STEP 2: Right-click on the file name in the layer list to access geocoder

118 of 162

119 of 162

NA Streets data - several Address Locators

Browse to the folder streetmap_na/data

and choose Street_Addresses_US

120 of 162

121 of 162

122 of 162

123 of 162

124 of 162

125 of 162

126 of 162

127 of 162

Review Geocoding Output

128 of 162

129 of 162

Red = Census

Blue = Google

Purple = ArcGIS

130 of 162

Red = Census

Blue = Google

Purple = ArcGIS

131 of 162

Red = Census

Blue = Google

Purple = ArcGIS

132 of 162

392K addresses in ~ 15 minutes

What to do about the 29K unmatched?

133 of 162

Post-process results

134 of 162

Review unmatched addresses

135 of 162

Poor match because missing street address!

136 of 162

Iterative Process!

Exploring unmatched records helps you get a sense of categories of problems with your data some of which you can go back and fix in batch.

Some errors will remain with large datasets.

137 of 162

Always Review Sample of Output

  • Visually inspect output quality

  • Check match score - proportion of low quality matches (< 75-80 %)

  • Proportion of unmatched addresses

138 of 162

Which Geocoder?

  • Google - if you can!
    • But watch out for terms of use

  • ESRI ArcGIS with NA Streets or BA2015 data
    • if restricted use data or exceed google limits

  • Try hybrid solutions: Google + ESRI

  • Custom locator if you have a reference data set

  • Census Geocoder if not restricted access data and exceed Google limits + want FIPS

139 of 162

Questions?

Links to materials on D-Lab website.

Thank you!

140 of 162

Extras!

Additional info on:

  • Geocoding with your own reference data
  • Getting Census FIPS Codes for geocoded locations.

141 of 162

Geocoding with Your Reference Data

142 of 162

Geocoding with Your Reference Data

Assumption

  • You have or can get the reference data you need

Why?

    • You can’t otherwise get the data
    • It’s better than any other data
    • Any other data would be too expensive
    • Historical research - you can’t use current data

143 of 162

Example: Buenos Aires, Argentina

...

144 of 162

First Try ArcGIS Online Geocoder

145 of 162

146 of 162

147 of 162

148 of 162

149 of 162

150 of 162

Select the most precise Address Locator Style supported by the data.

ç

151 of 162

In order to create a US Address Dual Ranges style locator, you need to identify columns for:

  • right & left side number ranges
  • street name
  • city (added)
  • state (added)

152 of 162

You must change the Role to Primary Table!

153 of 162

Save your address locator

154 of 162

155 of 162

156 of 162

Custom locator with user supplied reference data

AGOL World Geocoder

157 of 162

Linking Geocoded Addresses to Census Data

A few slides on that.

Be sure to select correct version of the census TIGER files for your analysis needs! See census website for details.

158 of 162

Download Census Data

Census Tracts - CA Alameda County

http://www2.census.gov/geo/tiger/TIGER2014/TRACT/tl_2014_06_tract.zip

Block Groups

http://www2.census.gov/geo/tiger/TIGER2014/BG/tl_2014_06_bg.zip

Blocks

http://www2.census.gov/geo/tiger/TIGER2014/TABBLOCK/tl_2014_06_tabblock10.zip

159 of 162

Add Census data to map

  1. Insert new data frame
  2. Copy tract data to it
  3. Copy geocoded results
  4. Right click on geocoded results

  • Data > Export Data
  • Save with same coords as the data frame
  • Save to shapefile
  • Intersect Analysis Tool

160 of 162

Intersect

Tool

161 of 162

162 of 162