Geo-Computation and Environmental Analysis 5-Day Workshop

Yale University, New Haven, USA

May 13-17th, 2019

The open source spatiotemporal data analyses and processing workshop is an immersion 5-day experience opening new horizons on the use of the vast potentials of the Linux environment and the command line approach to process geographical data. We will guide newbies who have never used a command line terminal to a stage which will allow them to understand and use very advanced open source data processing routines. Our focus is to enhance the approach of self-learning allowing participants to keep on progressing and updating their skills in a continuously evolving technological environment.

Trainers:

Dr. Giuseppe Amatulli (Yale University, USA, www.spatial-ecology.net)

Mr. Jacob Bukoski ( University of California, Berkeley)

Course Requirements:

The workshop is aimed at students who are currently at the master or doctoral level, as well as researchers with a common interest in spatiotemporal data analysis and modelling. Nonetheless, we accept undergraduate students as well. Participants should have basic computer skills and a strong desire to learn command line tools to process data. We expect students to have a special interest on geographical data analyses, previous experience in Geographic Information Systems would be helpful. Students need to bring their own laptops with a minimum of 4GB RAM and 30GB free disk space.

Academic Program:

The workshop provides students with the opportunity to develop key skills required for advanced spatial data processing. Throughout the training students will focus on developing independent learning skills which will be fundamental for a continuous learning process of advanced data processing. This is a progressing journey of development with the availability of more complex data and the ongoing technological revolution. Within the course many different, complementary and sometimes overlapping tools will be presented to provide an overview of the existing open source software available for spatial data processing. We will discuss their strengths, weaknesses and specificity for different data processing objectives (e.g.modelling, data filtering, query, GIS analyses, graphics or reporting) and data types. In particular, we will guide students to practice the use of different types of software and tools with the objective to assist in gaining a steep learning curve, which is generally experienced while using the new approach of analyzing data within a programming command line environment. Broadly, we focus our training on helping students to develop independent learning skills to find online help, solutions and strategies to fix bugs and independently progress with complex data processing problems.

The academic program is divided into the following areas of study and interactions:

Lectures: (15 to 30 minutes each) Students will take part in a series of lectures introducing basic functions of tools, theoretical aspects and background information, which is needed for a better understanding of the deeper concepts to be successively applied in data processing.

Hands on Tutorials: Students will be guided during hands on sessions where trainers will perform data analyses on real case study datasets, while the students fill follow the example procedure using their laptops. During tutorials sessions students are supported by two trainers, one for the demonstrations and one to supervise the students' work as well as helping with individual guidance on coding.

Hands on Exercises: In addition to tutorials and lectures, students are encouraged to take up their own independent study during the exercise sessions. Specific tasks will be set allowing to reinforce the newly learned data processing capacity presented in lectures and practically learned during the tutorial sessions. Such exercise sessions equip students with the confidence and skills to become independent learners and effectively engaged with the demands of advanced spatial-data processing.

Depending on the number of participants and their previous knowledge in programming, the more or the less topics can be addressed in accordance to the students' needs. The exercises and examples will be cross-disciplinary: forestry, landscape planning, predictive modelling and species distribution, mapping, nature conservation, computational social science and other spatially related fields of studies. Nevertheless, these case studies are template procedures and could be applied to any thematic applications and disciplines.

Round table discussions: these sessions are mainly focused on exchanging experiences, needs and point of views. We aim at clarifying specific student’s needs and challenges, focus on how to help and how to find solutions while problem solving.

Learning Objectives:

Our workshop will enable students to further develop and enhance their spatiotemporal data processing skills. Most importantly, it will allow them to start using professionally a fully functional open source operating system with software. With continuous practice during the week students will get more and more familiar with the command line and will focus on developing specific areas, including:

  • Developing a broad knowledge of existing tools and be able to judge the most appropriate for their needs.

  • Building confidence with the use of several command line utilities for spatial data processing and Linux operating system.

  • Developing data processing skills and increasing knowledge on data types, data modelling and data processing techniques.

  • Encouraging independent learning, critical thinking and effective data processing.

Time table: (7hr teaching/day)

9:00 - 10:30

Morning session 1 (1h30min)

10:30 - 10:45

Coffee break

10:45 - 12:30

Morning session 2 (1h45min)

12:30 - 13:30

Lunch

13:30 - 15:00

Afternoon session 1 (2h)

15:00 - 15:15

Break

15:15 - 16:30

Afternoon session 2 (1h30min)

Summary

Day 1:

Introduction / OSGeo-live operating system (GA).

Linux bash programming (GA).

Day 2:

Introduction to python (JB).

Day 3:

Introduction to GDAL/OGR - PKTOOLS (GA).

Explore and work with your geographic data using GDAL/OGR - PKTOOLS (GA).

Day 4:

Introduction to GRASS (GA).

Using Python for GIS and Remote Sensing operations (JB).

Day 5:

Working on students personal data (GA, JB).

Batch processing data using GRASS (GA) TBD (optional)

GIS and RS with R (JB) TBD (optional)

Parallel processing in R (GA) (optional)

Syllabus and Course Schedule

(some hours/topics can change in accordance to the needs of the participants)

DAY 1

MORNING

Session 1. Getting started: Knowing each other and LINUX OS (GA) (1.30 - 2.00 hours)

This session introduces the overall course program and Linux operating system. We also learn how to install and use a virtual environment operating system.

  1. Course objectives and schedule (Lecture) (GA).
  2. Get to know each other: trainers and participants (GA, BP).
  3. Linux environment, why and what to use to handling Geo-BigData (Lecture) (GA)
  4. Installation and introduction to the OSGeo-Live  (Hands-on tutorial) (GA)
  5. The www.spatial-ecology.net platform (Lecture) (GA)
  6. Lubuntu GUI and Unix/Linux command line (Hands-on tutorial) (GA)
  7. Monitor your process: ram and harddisk usage (Lecture) (GA).

AFTERNOON

Session 2. Jump start LINUX Bash programming (GA) (3.00-4.00 hours)

During this session we explore and practice the basics of BASH terminal command line. The acquired skills will be used in all following sections.

  1. Unix/Linux command line, why to use it? (Lecture)
  2. Command syntax and basic commands  (Hands-on tutorial)
  3. Redirection of the input/output
  4. Read and explore a text file
  5. Meta-characters and regular expression, their use
  6. Concatenate process (pipe)
  7. The use of variables and Iteration (for loop, while)
  8. Exercise
  9. Download files and unzipped
  10. File management
  11. Basic bash scripts

Session 3: AWK Programming language (If we have time) (GA) (0.30 -1.00 hours)

This session is fundamental for data filtering and preparation, text files manipulation, descriptive statistics and basic mathematical operation on large files. Students will access, query, understanding and cleaning up data, perform data filtering using bash command line. We use AWK which is an extremely versatile and powerful programming language for working on text files, performing data extraction and reporting or to squeeze data before importing them into R or other software types.

   

  1. Why to use AWK command line (Lecture)
  2. The basic commands, command syntax (Hands-on tutorial)
  3. Built in variables
  4. Import variables
  5. String functions
  6. Numerical functions
  7. Query functions
  8. Manipulate large files before importing in R
  9. Create Raster ID with awk 
  10. Fast plotting of txt file using GNUPLOT 
  11. Embed  awk in GNUPLOT

DAY 2

MORNING

Session Recap: Review the main command of Linux OS

  1. Use the the command of Day 1 to get familiar with Linux OS (GA)
  2. The use of Rstudio for bash scripting  (Hands-on tutorial) (JB)

Getting started with Python (JB)

During this session we explore and practice the basics of PYTHON language. We will describe the fundamental structure of python objects and how to build functions, modules and classes.  The acquired skills will be used in all following sections.

  1. Getting started with Python, install and run on various platforms, managing versions, programming environments and how to think the Python way (lecture and hands on tutorial)
  2. Basic Data Types: integers, strings, list, dictionaries (lecture).
  3. Flow Control Statements: if, while, for (lecture).
  4. Functions (lecture).
  5. Modules and packages (lecture).
  6. Basic Numpy (numerical python) (lecture).
  7. Basic Math operations (lecture).
  8. Basic Tools: compression, web interfacing (lecture)
  9. Basic geoapplications: raster, shapefile (lecture)
  10. Hands on tutorial and exercises on real code.
  1. 01_Python_Intro.ipynb
  2. 02_Geo_Python.ipynb

AFTERNOON

  1. Setting up your bash script 
  2. Using arguments in bash scripts
  3. Setting up your python script 
  4. Transfer your data in the VM
  5. Start to work with your data

DAY 3

MORNING

Session 6. Basic concept of GIS and Remote Sensing (GA)

  1. Main representation of geographic features rasters and vectors.
  2. Simple concept of GIS: overlay, intersect, buffer

Session 6. Exploring and understanding geographical data: introduction to GDAL/OGR libraries (GA)

This section introduces data manipulation for geospatial data processing on the command line using GDAL and OGR libraries.

  1. GDAL/OGR  & PKTOOLS for raster and vector analysis  - (Lecture).
  2. Geographic Projections database 
  3. Raster and vector data formats and data type. 
  4. GeoTIFF format
  5. Openev & QGIS for raster and vector visualization
  6. Command syntax
  7. Raster/vector data manipulation for multiple image processing using GDAL & PKTOOLS
  8. Calculate average and standard deviation inside the polygons
  9. The use of .VRT for splitting and merging images

AFTERNOON 

Session 7. Explore and work with your geographic data using GDAL/OGR  & PKTOOLS (GA)

This section the course participants can bring their data and start to manipulate the and analysis using the tools up to know presented. You can decide to work in group or by yourself. We will assist you and suggest the best tools available.    

  1. Visualize your data using Openev & QGIS
  2. Get/Print metadata of the raster or vector in the terminal
  3. Create a working directory and start to write your script.  

DAY 4

MORNING

Session 9 - Command line GIS - Getting started with GRASS and Qgis (GA)

This session will introduce the use of GRASS geographic information system in its command line interface for spatial-data processing, and the use of QGIS for map visualization and overlay tool. We do not expect any previous knowledge of GRASS or Qgis, but will use basic BASH and GDAL command line skills acquired in the previous days. This session is interesting for people that deal with general GIS analysis, hydrological modelling, DEM, vector/ raster data integration etc. This section is fundamental for the topic of tomorrow.

  1. Introduction to grass data structure and environment - (Lecture).
  2. GRASS and Qgis as learning tools - (Lecture).
  3. Accessing GRASS and links to Qgis - (Hands on tutorial).
  4. Command syntax and general commands of data handling.
  5. Grass working environment and bash working directory.
  6. Location and map set.
  7. Region settings.
  8. Raster and vector data import, export, display and conversion.
  9. Raster map calculator.
  10. Vector manipulation and processing.
  11. Production of maps and tables layout for reporting.
  12. https://grasswiki.osgeo.org/wiki/GRASS_and_Python
  13. https://grasswiki.osgeo.org/wiki/GRASS_Python_Scripting_Library

AFTERNOON

Session 8. Using Python for GIS and Remote Sensing operations - (JB)

This section focuses on the use of python for GIS and Remote Sensing Operations. Python can be used as an alternative language to BASH to call GDAL/OGR or PKTOOLS command line tools but can be also used as integrate the GDAL/OGR API in your process.

  1. Call bash command (so gdal/pktools) from python  
  2. Import gdal/ogr module in python
  3. Read and write raster data in python
  4. The use of NumPy to manipulate raster data
  5. Extracts unique pair of values between two tifs/rasters and stores the results in a txt file.
  6. Additional source:
  7. https://pcjericks.github.io/py-gdalogr-cookbook/
  8. http://www.gis.usu.edu/~chrisg/python/2009/

DAY 5

MORNING

Session 9. Working on student’s personal data

CONCLUSION – Focus on the student projects needs and how to get going with the use of free and open source tools for advanced spatial data processing.

Round table, question and answer session on specific data processing needs and how to follow up the summer school using open source softwares as daily working toolbox.

AFTERNOON

Parallel session 10a (ADVANCE):  Batch processing data using GRASS (GA) TBD

This parallel section assumes good understanding and discrete control of Bash command line and full knowledge of GRASS location and mapset. It is mainly indicate for people that already move in the direction of large data processing and are willing to explore more on batch job processing and multi-core data manipulation for Geo-Computation.

 

  1. GRASS70 Create Location using ancillary layer - (Hands on tutorial).
  2. Setting GRASS70 variables for GRASS bash job - (Hands on tutorial).
  3. Create a Location, enter in GRASS and import data - (Exercise).

Parallel session 10b (INTERMEDIATE): GIS and RS with R (BP) TBD

This section shows the use of R for GIS and RS analysis, mainly using raster packages. A brief talk is presented followed by a small study case.  

  1. Land cover change modeling (Hands on tutorial)
  2. Generating raster palette for visualization.
  3. Processing covariates for land cover change modeling.
  4. Map comparison via Crosstab and raster algebra.
  5. Modeling of land change with logistic model.
  6. Model evaluation with ROC and TOC metrics.

Parallel session 10c (ADVANCE): Parallel processing in R (SW) TBD

In this section, the use of R for parallel statistical computation will be introduced. A lecture will introduce the concept of parallelization in R presenting the different R library available. This section assumes a basic knowledge of R.

  1. Parallel computation (lecture)
  2. R library for parallelization (lecture)
  3. Parallel computation in GIS application under R (Hands on tutorial)