Course Program

Time table: (7h teaching/day)

 9:00 - 10:45   morning session 1         1h45

10:45 - 11:05  coffee break

11:05 - 12:50  morning session 2        1h45        

12:50 - 14:00  Lunch

14:00 - 15:45  afternoon session 1        1h45

15:45 - 16:00  break

16:00 - 17:45  afternoon session 2        1h45

DAY 1 - 15th of June


This section introduce the overall course program and our operating system environment. We then set the BASIC KNOWLEDGE of BASH command lines. The acquired skills will be used in all the different sections, so everybody needs to follow it. 

Introduction (1h45min - Giuseppe - Stefano)

  1. Welcome speech  (Prof. Mirizzi,  DiCEM Department Director).
  2. Knowing each other: backgrounds of trainers and participants  (G.S.)

             (Copy the LVM folder - Install VirtualBox-4.3.28-100309-Win.exe or VirtualBox-4.3.28-100309-OSX.dmg) 

  1. Identifying participant expectations (S.)
  2. Course objectives and schedule (S.)
  3. The multi session program organization (S.)
  4. Linux environment, why and what to use to handling BigData (Lecture) (G.)
  5. Installation and introduction to the Linux Virtual Machine (Hands-on) (G.)
  6. platform (Lecture) (S.)
  7. Lubunt GUI and Unix/Linux command line (S.)
  8. The use of kate as an editor (Vi, Nano, Leafpad, gedit and Emacs) (S.)


UNIX/LINUX Bash programming (Introduction) (1h45min - G.)

  1. Unix/Linux command line (G.)
  2. Command syntax and basic commands (G.)
  3. Redirection of the input/output (G.)
  4. Read and explore a text file (G.)
  5. Meta-characters and regular expression, their use (G.)
  6. Concatenate process (pipe) (G.)
  7. The use of variable (G.)
  8. String manipulation (G.)
  9. Iteration (for loop, while) (G.)
  10. Exercise (G.)
  11. File management (S.)
  12. Download files and unzipped (S)

DAY 2 - 16th of June


This section is fundamental for data filtering and preparation, bulk data download, text files manipulation, preliminary data visualization, descriptive statistics and basic mathematical operation on large files. The objective is to access and clean up data ready to integrate them using different language and libraries in bash environment. We use AWK which is an extremely versatile and powerful programming language for working on files, performing data extraction and reporting, extremely useful to squeezing data before to import in R or other softwares.    

Recap BASH (G- 20min)

AWK Programming language introduction (G. 3h)

  1. Handling and understanding pc performance (Lecture) (G.)
  2. Why to use AWK command line (S.)
  3. The basic commands, command syntax
  4. Built in variables
  5. Import variables
  6. String functions
  7. Numerical functions
  8. Query functions
  9. Manipulate large files before importing in R
  10. Exercise


This section introduces command lines data manipulation for spatio temporal data processing. Using a multi temporal raster maps. We assume a basic knowledge of Geographic Information Systems and Remote Sensing concepts (projection, pixel resolution, etc). We also introduce the fundamental basis to use pktools for remote sensing image analysis.

GDAL/OGR introduction (setting the bases for an advanced use) (2h45 G.S.)

  1. GDAL/OGR for raster and vector analysis
  2. Command syntax
  3. Raster/vector data manipulation
  4. Scripting GDAL/OGR functions in loops for multiple image processing
  5. Exercise
  6. Overview of Openforis
  7. Example raster image calculator using oft-calc

GNUPLOT introduction (45min - G.)

  1. Command syntax
  2. Simple graphic
  3. Integrate gnuplot with Bash and AWK

At the end of day 2, you will be able to integrate bash awk and gdal in batch processing. It is important to feel confident with programming basics to attend the advance parallel sessions. If you require more training in bash/awk, you should attend “basic parallel sections” that slowly integrate the different languages. On the opposite, if you feel proficient in the subject, you can skip the basic section and go to the advance.

DAY 3 - 17th of June

Starting from day 3, you can choose to work more on basic programming that progressively integrate the different languages, else, if you feel quite comfortable with command line seen during day 1 and 2, you can skip the basic section and go to the advance.


This section will introduce Qgis GUI and GRASS command line for beginners. It does not assumes any preliminary knowledge of GRASS - Qgis, but will use basic BASH and GDAL command line skills acquired in day 1 and 2. This session is interesting for people that deals with general GIS analysis, hydrological modelling, DEM, vector/ raster data integration etc.

Recap Bash/ Awk - (10min G.) All students

Introduction to grass data structure and environment-

(Lecture 20 min S.  -  All students)

Parallel session I. GRASS and QGIS introduction (hands on 3h - S.)

  1. Introduction to Quantum GIS
  2. Quantum GIS plugins
  3. QGIS as GUI to GRASS and as a learning tool
  4. Accessing GRASS
  5. Command syntax and general commands of data handling
  6. Raster and vector data import, export, display and conversion
  7. Raster map calculator
  8. Vector manipulation and processing
  9. Production of maps and tables layout for reporting
  10. Exercise

Parallel session 2 : GRASS advance (3h - G.)

This section assumes good understanding and discrete control of Bash command line we have seen during day 1 and 2. It is mainly indicate for people that already move in the direction of large data processing and are willing to explore more on batch job processing and multicore data manipulation for Geo-Computation.

  1. Scripting in GRASS: combining BASH commands with GRASS commands
  2. GRASS/BASH Environment variables
  3. GRASS in batch mode
  4. Exercise

AFTERNOON (All students)

Geospatial data processing in QGIS

Introduction to: QGIS plugins, processing toolbox and graphical model builder

(3.30h - P.D.)

  1. Standard built in utilities (via menu)
  2. Python console
  3. Plugins
  4. Processing toolbox
  5. Graphical modeler
  6. Batch processing

DAY 4  - 18th of June


Recap Geospatial data processing in QGIS

 (All students) - Pktools (30min - Daniel / Pieter)

Statistical oriented sessions (R)

Parallel session 1 : R introduction (setting the bases for an advanced use) (1h45 - Stefano)

  1. Introduction to R environment

  2. R structure, libraries, scripting and getting help

  3. Command syntax, R objects

  4. Basic commands (input, output, data creation)

  5. Data manipulation

  6. Plotting data and graphical parameters

  7. Iterations (if/ifelse conditions, for loop, while)

  8. Exercise

  9. Intro to most relevant R Spatial - Temporal libraries

Parallel session 2 : R advance (You need to have basic familiarity of R environment) (1h45min - Giuseppe)

  1. Importing/Exporting geo-data

  2. Spatio-temporal and ecological modeling libraries

  3. A linear model and step-wise regression

  4. More complex algorithms (CART / Random Forest)

  5. Model prediction

  6. Exercise

Vector data manipulation (All students) :

Spatialite for advanced vector data analysis (1h45 - Daniel).

  1. Overview of SQLite and Spatialite
  2. Using GDAL/OGR with Spatialite
  3. SQLite SQL dialect
  4. Spatial operations using Spatialite

AFTERNOON  (All students):

I. Recap R for spatio - temporal data analyses (30min - Stefano)

II. Raster analysis in R  (1h30 - Giuseppe) (All students)

  1. Raster library, performance and limitations

  2. Model training and prediction using raster data

  3. Grass and R integration

III. Creating graphics with ggplot2 - optional session 1(1h30h - Daniel)

  1. Basics of plotting (non)-spatial data

  2. Introduction to ggplot2

  3. Creating high quality graphics with ggplot2

  4. Exercises

IV. LiDAR data processing - optional session 2 (1h30 - Pieter)

  1. Introduction to LiDAR
  2. LAS and LAZ data format
  3. Processing LAS data: case study on LiDAR products in forestry
  1. digital surface model (DSM)
  2. digital terrain model (DTM)
  3. vegetation height model (VHM)
  4. percentage height values (PHV)
  5. intensity map

DAY 5  - 19th of June --------------------------------


Recap (all students) : ggplot / R spatio temporal data analyses (30min Daniel / Giuseppe)

Advanced geospatial data processing:

land cover classification (4.30 h - Pieter and Daniel)

  1. building processing workflows via bash scripts

  2. GDAL/OGR, OrfeoToolbox, pktools 

  3. Data pre-processing using GDAL/OGR, Spatialite

  4. Creating training data

  5. Unsupervised classification using Orfeo Toolbox
  6. Supervised classification using pktools

  7. Accuracy assessment using pktools/Spatialite/QGIS


Advanced geospatial data processing:

image filtering, atmospheric correction, segmentation (Optional, Pieter/Daniel 3h)

  1. Overview of OTB utilities & pktools

  2. Image filtering in spatial domain

  3. Image filtering temporal/spectral domain

  4. Image segmentation

  5. Atmospheric corrections

Lecture “Introduction to the OSM project”. Francesco Lovergine  - CNR Bari. (1h)

Social Dinner

Day 6 - Saturday

Work with your data

Participants can bring their own data and processing problems.

1. Student’s projects:

We aim at selecting 4 to 8 datasets and document a processing routine for answering a specific task. This section offers a unique opportunity for participants to learn how open source tools can solve their analysis problems. Solutions will be presented and explained in detail, depending on the quantity and complexity of the problems. Data privacy will be respected according to the data policy at hand.

2. Open session Raspberry pi hadoop cluster

3. Introduction to Google earth engine  

4. Wrap up with a summary of the material discussed in the course.

Optional session

Language/software integration session (advanced geospatial data processing in cluster computing - really advance ) (2h30min Giuseppe).

  1. Cluster/grid parallel processing on the Virtual Machine

  2. Simple exercise on cluster computing

  3. Combine multiple languages in a scripting-process

  4. Template for building scripts

  5. Import and export variables from different languages

  6. Integrate GRASS and R in a bash script

  7. The use of GRASS in clustering computing

  8. Working in tiles for Cluster/Grid/Parallel processing

  9. Cloud computing Amazon web services

  10. Advanced programming for clustering computation

  11. Parallel programming

  12. Using xargs/parallel and qsub for job queuing

  13. Exercise