Open source digital data collection for field sciences.
Isaac I. Ullah, PhD
San Diego State University
1
This work is licensed under a Creative Commons Attribution 4.0 International License.
Why Open Science?
“Often described as ‘open science’, these new norms include data stewardship instead of data ownership, transparency in the analysis process instead of secrecy, and public involvement instead of exclusion. … We believe that there is much to be gained, both for individual researchers and for the discipline, from broader application of open science practices. … [W]e have identified three elements of open science that cross-cut [these] themes: open access, open data, and open methods.” (emphasis added)
SAA Open Science Interest Group.
Wiki: https://osf.io/2dfhz/
2
Open Methods for Open Digital Data
The benefits of digital technology for the field sciences are obvious:
However, with the adoption of these new technologies, we have an important opportunity choice. Will we support the idea of Open Science by ensuring our digital data collection and manipulation workflows are:
3
Open Science, Digital Data, and Linux
If we are really serious about meeting these objectives, everything about our data needs to be open. That includes methods for gathering, storing, manipulating, analyzing, and disseminating those data, right down to the source code of the software(s) that we used to do everything. Our choice of operating system is an essential part of this chain, but one that is perhaps not frequently considered by field scientists.
4
I’m a pretty good choice for open field-science!
The Bova Marina Archaeology Project and the Kazakh-American Archaeological Expedition
My experiences in these two projects exemplify an open-source approach to:
5
Field Data Collection Workflow
6
Update GIS Database
Basic Data Analysis and Summary
Export to Mobile GIS
Lab Work (evening)
Planning (evening)
Plan Field Work
Create/Modify Forms
Plan Drone Missions
Collect Form Data
Collect GPS Data
Fly Drone Missions
Field Work (morning)
Daily Form Data Export
Process Drone Imagery
Daily GPS Data Export
Data Dump (afternoon)
The Hardware I Use and Recommend
8” Android Tablet - Lenovo Yoga Tab
Bluetooth GPS - Bad Elf GNSS Surveyor
Quadcopter - DJI Mavic Pro
Camera - Olympus OM-D E-M1ii and 12-14mm f2.8 Pro Lens
7
Mobile Data Collection
Open Data Kit and the fork, “GeoODK,” allow easy creation of custom forms that are deployable on multiple mobile devices via the ODK Collect app.
1) Build Form in spreadsheet 2) Convert to ODK XML format 3) Distribute to devices
Libreoffice Calc XLS-Form (online or Python) Geo/ODK-Collect
8
Mobile Data Collection
Collected data is aggregated into a central database, and can be exported to common tabular data and GIS formats. This is done with ODK Briefcase or ODK Aggregate.
1) Save completed forms 2) Aggregate form data 3) Export database
Geo/ODK-Collect ODK Briefcase or ODK Aggregate CSV, SHP, etc.
9
Mobile GIS
At the end of each field day, the form data is aggregated, exported, and a centralized GIS project is updated. The connection of QGIS and the Q-Field app allows every tablet to have a queryable, editable, up-to-date version of the GIS database in the field.
1) Daily export of ODK database 2) Centralized QGIS project 3) GIS data in Q-Field
A daily CSV file with coordinates Styles, labels, layers, and a QGIS project file Layers can be hidden/shown
10
High Precision GPS Tracking
The free (but not open) Bluetooth GPS* app allows the GPS coordinates from the GNSS surveyor to wirelessly replace those from the internal GPS of the device. The GPS Logger app is a flexible solution to record your location in real time. For example, actual survey transect pathways can be recorded, and sweep widths calculated.
1) Bluetooth GPS connection 2) Real-time GPS logging 3) GPS track in Q-GIS
High-precision GPS data Each surveyor logs transect data Actual walked transect +
sweep width
11
*An Open-Source alternative exists on Sourceforge, but is currently abandoned.
Managing Field Photos
Rapid Photo Downloader greatly smooths the process of downloading images from multiple cameras, and making backups to an external disk. It automatically organizes the images in a user-definable file tree and renaming options, including EXIF tag information. Definable “job codes” can help further differentiate projects.
12
Flight Planning for DJI drones
DJI provides an SDK and a well-documented API. Dronepan is libre flight planner app available for DJI drones, but not yet on Android. There are, however, several gratis Android apps, including DJI’s own GO 4, DroneHarmony, and Aerobotics Flight Planner. I have created a libre spreadsheet calculator to help plan flights, but using an automated app-based planner is much more convenient.
13
Drone Image Processing
Open Drone Map can create georeferenced orthophotos, pointclouds, 3-D textured mesh, and georeferenced DEMs from unordered drone images with GPS tags. Although it’s best to use a decent computer, it’s really as easy as “load images and hit ‘run’.” In the field, processing time is sped up by choosing to downsize images. Resulting 3D and imagery data are more than good enough for use in the field.
14
Field Data Collection: Lessons Learned
This last summer marked my sixth field season of paperless data collection altogether. There are several valuable lessons I’ve learned during this time:
15
“Data Hygiene” and Post-Processing
Data produced through this workflow are in reasonably good condition. At the end of the field project, however, a few items of “data hygiene” must be done to correct for human error.
Once this is done, the final post-processing and analysis of the data products can be undertaken. Some of this post-processing can be automated, but it often takes a “human-touch.”
16
Fixing Form Data
2) Tracklogs must be connected or converted
3) Multiple photos are linked in a subsidiary table, and must be reconnected
4) Correction of typos and “autocorrect” mistakes
17
43.467675 77.823259 856 3.81;43.467675 77.823261 856 3.81;43.467675 77.82327099999999 856 3.81;43.467673999999995 77.823281 856 3.81;43.467673999999995 77.823291 856 3.81;43.467673999999995 77.823301 856 3.81;43.467676999999995 77.823309 856 3.81;43.467678 77.823308 856 3.81;43.467678 77.82329899999999 856 3.81;43.467678 77.8233 856 3.81;43.46768 77.823309 856 3.81;43.467684999999996 77.823317 856 3.81;43.467689 77.82332799999999 856 3.81;43.467693999999995 77.823341 856 3.81;43.467695 77.823353 856 3.81;43.467698999999996 77.823366 856 3.81;43.467709 77.823371 856 3.81;43.467718 77.823374 856 3.81;43.467727 77.823376 856 3.81;43.467737 77.823379 856 3.81;43.467746999999996 77.823382 856 3.81;43.467757 |
Fixing the GIS Database
The spatial aspects of the project are best managed in a GIS. GRASS and QGIS are the “dynamic duo” that I recommend. I use QGIS for daily use in the field and for making print-quality maps, and GRASS for all data analysis. They play nicely together.
Hygiene includes projection and topology correction, merging of data layers, and creation of proper metadata and “project files” with styling and layering. PP includes advanced geospatial analyses, statistical analysis, creation of publication-quality maps.
18
Digital Asset Management (D.A.M.) and Imagery
Images need to be tagged and added to a searchable database. DigiKam makes this easy! It works well with the file structure produced by Rapid Photo Downloader. “Geotags” can be added from GPS tracks easily with GottenGeography. All tags can be stored as EXIF or XMP tags, so they travel with your images.
19
Qualitative Imagery Analysis
Features can be enhanced and isolated in field photographs using image enhancement software like GIMP and the G’MIC plugin.
20
Final Hi-Res Drone Image Processing
21
Running Open Drone Map in parallel == NERDVANA!
Processing High-Resolution Drone Imagery and Topography
GRASS GIS offers tools to analyze high resolution topography and imagery derived from aerial drone survey. Tools for LiDAR, terrain analysis, hydrology, and imagery analysis are particularly useful in post-processing these data.
22
Post-Processing: Lessons Learned
This year, I’ve had three students working on “data hygiene” and post-processing for these two projects.
23
Data Curation, Versioning, and Dissemination
The goal of open, reproducible science requires any project that generates data to curate, version, and disseminate that data. There are several tools available to facilitate this. Which one you choose will depend on the project’s goals and time/money budget, but some things to consider are:
The principles of open, reproducible science are best met through use of open-source software tools, employed through scripted workflows, that generate plain-text or open-standard data formats, with abundant, informative, metadata, and released under permissive licensing.
Use of a Linux-based operating system offers a seamless and thorough avenue to achieve these goals
24
Curating Data
Data curation consists of archiving and maintaining long-term copies of the data.
25
Versioning Data
Data versioning consists of tracking and recording all changes made to data over time.
26
Disseminating Data
Data dissemination consists of making data findable and publicly available.
27
Data dissemination: Lessons learned.
There are a plethora of ways to make data available. While a richness of options is a good thing, it makes finding the actual pieces of data you are interested in quite difficult. Rather than force people to use one method, I suggest the creation of community-driven catalogs, where links to data on similar topics can be centralized, curated, annotated, and shared by and for the community that wants to use them.
28
www.cmaple.org
Thank You!
Thanks to the Bova Marian Archaeological Project team, the Kazakh-American Archaeological Expedition team, my students in the Computational Archaeology Lab at SDSU, all the devel teams of all these wonderful pieces of F/LOSS, the organizers of SCaLE-16x, and to YOU!
More information, including links and downloads, can be found at my website:
29