logo_spatial_ecology.png          unibas_logo.jpeg         dicem_small.png         

Spatial ecology in collaboration with the University of Basilicata - DiCEM (Dipartimento delle culture europee e del mediterraneo).

International Summer School:

Geocomputation using free and open source software

Jump start with R, Grass, Python, Gdal/Ogr libraries and the linux operating system

Matera, Italy,  19th-23rd June 2017

The open source spatio-temporal data analyses and processing summer school is an immersion 5 day experience opening new horizons on the use of the vast potentials of the Linux environment and the command line approach to process data. We will guide newbies who have never used a command line terminal to a stage which will allow them to understand and use very advanced open source data processing routines. Our focus is to enhance the approach of self-learning allowing participants to keep on progressing and updating their skills in a continuously evolving technological environment.

Trainers:

Dr. Giuseppe Amatulli, (Yale University, USA, www.spatial-ecology.net).

Dr. Stefano Casalegno (University of Exeter, UK ; www.spatial-ecology.net).

Dr. Francesco Lovergine (CNR Bari, Italy)

Course requirements:

The summer school is aimed at students who are currently at the master or doctoral level, as well as researchers and professionals with a common interest in spatio-temporal data analysis and modelling. Nonetheless, we accept undergraduate students as well. Participants should have basic computer skills and a strong desire to learn command line tools to process data. We expect students to have a special interest on geographical data analyses, previous experience in Geographic Information Systems would be helpful. Students need to bring their own laptops with a minimum of 4GB RAM and 30GB free disk space.

Registration:

Registration is on a first come, first served basis and will be closed when 25 participants have signed up. Therefore, we encourage participants to register ASAP. A waiting list will be established in case of exceeding the limit.

Academic programme:

The summer school provides students with the opportunity to develop key skills required for advanced spatial data processing. Throughout the training students will focus on developing independent learning skills which will be fundamental for a continuous learning process of advanced data processing. This is a progressing journey of development with the availability of more complex data and the ongoing technological revolution. Within the course many different, complementary and sometimes overlapping tools will be presented to provide an overview of the existing open source software available for spatial data processing. We will discuss their strengths, weaknesses and specificity for different data processing objectives (eg.: modelling, data filtering, query, GIS analyses, graphics or reporting) and data types. In particular, we will guide students to practice the use of different types of software and tools with the objective to assist in gaining a steep learning curve, which is generally experienced while using the new approach of analysing data within a programming command line environment. Broadly, we focus our training on helping students to develop independent learning skills to find online help, solutions and strategies to fix bugs and independently progress with complex data processing problems.

The Academic Programme is divided into the following areas of study and interactions:

Lectures: (15min to 1h each) Students will take part in a series of lectures introducing basics functions of tools, theoretical aspects and background information, which is needed for a better understanding of the deeper concepts to be successively applied in data processing.

Hands on Tutorials: Students will be guided during hands on sessions where trainers will perform data analyses on real case study datasets, while the students fill follow those example procedure using their laptops. During tutorials sessions students are supported by two trainers, one for the demonstrations and one to supervise the students' work as well as helping with individual guidance on coding.

Hands on Exercises: In addition to tutorials and lectures, students are encouraged to take up their own independent study during the exercise sessions. Specific tasks will be set allowing to reinforce the newly learned data processing capacity presented in lectures and practically learned during the tutorial sessions. Such exercise sessions equip students with the confidence and skills to become independent learners and effectively engaged with the demands of advanced spatial-data processing.

Depending on the number of participants and their previous knowledge in programming, the more or the less topics can be addressed in accordance to the students' needs. The exercises and examples will be cross-disciplinary: forestry, landscape planning, predictive modelling and species distribution, mapping, nature conservation, computational social science and other spatially related fields of studies. Nevertheless these case studies are template procedures and could be applied to any thematic applications and disciplines.

Round table discussions: these sessions are mainly focused on exchanging experiences, needs and point of views. We aim at clarify specific student’s needs and challenges, focus on how to help and how to find solutions while problem solving.

Learning objectives:

Our summer school will enable students to further develop and enhance their spatio-temporal data processing skills. Most importantly, it will allow them to start using professionally a fully functional open source operating system with software. With continuous practise during the week students will get more and more familiar with the command line and will focus on developing specific areas, including:

  • Developing a broad knowledge of existing tools and be able to judge the most appropriate for their needs.
  • Building confidence with the use of several command line utilities for spatial data processing and Linux operating system.
  • Developing data processing skills and increasing knowledge on data types, data modelling and data processing techniques.
  • Encouraging independent learning, critical thinking and effective data processing.

Summer school certification: At the end of the summer school the attendees will receive a course certificate upon successful completion of the course, although it is up to the participant’s university to recognize this as official course credit.

Time table: (7h teaching/day)

9:00 - 10:45

Morning session 1 (1h45min)

10:45 - 11:05

Coffee break

11:05 - 12:50

Morning session 2  (1h45min)

12:50 - 14:00

Lunch

14:00 - 15:45

Afternoon session 1  (1h45min)

15:45 - 16:00

Break

16:00 - 17:45

Afternoon session 2  (1h45min)

Summary

Day 1:

Knowing each other / OSGeo-live operating system

Linux bash programming

Day 2:

 AWK basic; Gnuplot plotting; Gdal/OGR, Pktools and OpenForis geospatial libraries

Day 3:

Spatial data processing with Python

Day 4:

R basic

R advance

Grass basic

Grass advance

Python Deep Learning

Hands on spatial ecology applications: geospatial data processing and modelling.

R environment for statistics and graphics

QGIS and GRASS Geographic Information Systems

Smart analytics with python

Day 5:

Hydrological modelling;  species distributions models; remote sensing images processing; spatio-temporal statistics.

Working on students needs

DAY 1

MORNING

Session 1. Getting started: Knowing each other and LINUX OS (Amatulli, Casalegno)

This session introduce the overall course program and Linux operating system. We also learn how to install and use a virtual environment operating system.

  1. Get to know each other: trainers and participants - Identifying participant expectations and needs (Round-table).
  2. Course objectives and schedule.
  3. Linux environment, why and what to use to handling BigData (Lecture) (A.)
  4. Installation and introduction to the Linux Virtual Machine (Hands-on tutorial) (A.)
  5. The www.spatial-ecology.net platform (Lecture) (A.)
  6. Lubuntu GUI and Unix/Linux command line (Hands-on tutorial) (C.)
  7. The use of kate as an editor (Hands-on tutorial) (C.)

AFTERNOON

Session 2. Jump start LINUX Bash programming (Amatulli)

During this session we explore and practice the basics of BASH terminal command line. The acquired skills will be used in all following sections.

  1. Unix/Linux command line - Lecture.        
  2. Command syntax and basic commands - Lecture.        
  3. Redirection of the input/output - (Hands on tutorial).        
  4. Read and explore a text file - (Hands on tutorial)..        
  5. Meta-characters and regular expression, their use - (Hands on Tutorial).        
  6. Concatenate process (pipe) - (Hands on Tutorial).
  7. The use of variables - (Hands on Tutorial).
  8. String manipulation - (Hands on Tutorial).
  9. Iteration (for loop, while) - (Hands on Tutorial).
  10. Data manipulation with Bash : Exploratory data analysis on European forest fire database. - (Exercise).        
  11.  Command line File management - Exercise.
  12. Download unzipping and manipulate multiple files using the command line - (Exercise).

Session 5. Discovering the power of AWK programming language (Amatulli).

This session is fundamental for data filtering and preparation, bulk data download, text files manipulation, descriptive statistics and basic mathematical operation on large files. Students will access, query, understanding and cleaning up data, perform data filtering using bash command line. We use AWK which is an extremely versatile and powerful programming language for working on text files, performing data extraction and reporting or to squeeze data before importing them into R or other software types.

  1. Welcome to AWK world . Why to use AWK command line - (Lecture).
  2. The basic commands, command syntax - (Lecture).
  3. Built in variables - (Hands on tutorial).
  4. Import variables - (Hands on tutorial).
  5. String functions - (Hands on tutorial).
  6. Numerical functions - (Hands on tutorial).
  7. Query functions - (Hands on tutorial).
  8. Manipulate large files before importing in R - (Hands on tutorial).
  9. Hands on AWK:  Spation temporal data analyses of the European forest fire database. - (Exercise).

Session 6. Getting started with GNUPLOT (Amatulli).

This session introduces the command-line driven graphic utility GNUPLOT. Even though it has very sophisticated graphical options for a final layout definition, Gunuplot is a very powerful tool for rapid and effective preliminary data visualization. It is embedded in the bash-awk terminal and we can perform data filtering, random data extraction and many other operations quick visualisation of data sets for preliminary analyses.

  1. Accessing gnuplot, the Gnuplot syntax ( Lecture)
  2. 2D & 3D Data plots - (Hands on tutorial).
  3. Combine awk and gnuplot - (Hands on tutorial).
  4. Plot 3D Lidar data with Gnuplot (exercise).

DAY 2

MORNING

Session 7. Exploring and understanding geographical data: introduction to GDAL/OGR, Pktools and OpenForis libraries (Amatulli).

This section introduces data manipulation for geospatial data processing on the command line using GDAL and OGR libraries.

  1. GDAL/OGR  & PKTOOLS & OFT-TOOLS for raster and vector analysis  - (Lecture).
  2. Geographic Projections database 
  3. Raster and vector data formats and data type. 
  4. GeoTIFF format
  5. Openev & QGIS for raster and vector visualization
  6. Scripting GDAL/OGR functions in loops for multiple image processing. - (Hands on tutorial).
  7. Raster/vector data manipulation for multiple image processing using GDAL & PKTOOLS & OFT-TOOLS
  8. Calculate average and standard deviation inside the polygons.
  9. Download MODIS Land Surface Temperature (*.hdf) from Nasa ftp.
  10. Example of raster and vector data processing operation with OpenForis (Exercise)
  11. Languages/Software data integration: GDAL, PKTOOLS and R

DAY 3

MORNING

Session 3. Jump start with Python language (Lovergine) (All students) 

  1. Getting         started with Python, install and run on various platforms, managing versions, programming environments and how to think the Python way (lecture and hands on tutorial)
  2. Types, variables and operations (lecture).
  3. Statements and syntax (lecture).
  4. Functions and generators (lecture).
  5. Modules and packages (lecture).
  6. Classes and pythonic OO programming (lecture).
  7. Exceptions and tools (lecture).
  8. Hands on tutorial and exercises on real code.

AFTERNOON

Session 4. Geospatial Python (Lovergine) (All students) 

  1. Basic and not so basic geospatial/add-on packages (lecture)
  2. Numpy         (numeric python), Scipy (scientific python) and friends in brief (lecture).
  3. Using GDAL/OGR via Python (Hands on tutorial).        
  4. Analysis of gdal-python tools as case of studies (Hands on tutorial).
  5. Hands        on exercises with real code (Exercise).

DAY 4

Starting from day 3, we are presenting 2 parallel sections which require different programming level. You can select one based on your programming experience or thematic focus.

MORNING

Parallel session 8a. Getting started with the R environment for statistical computing, modelling and graphics.

In this section the use of R for statistical computation will be introduced. It is not expected to have any previous knowledge of R. Rather than concentrating on already built in scripting routines we will focus on different R-data structures and how to open, query and plot data easily. If the student  are already familiar with R we could skip this section and provide supervision on your data processing needs or follow an advanced R hands-on exercise.

  1. Introduction to R environment and user community - (Lecture).
  2. R structure, libraries, scripting and getting help - (Hands on tutorial).
  3. Command syntax, R objects - (Hands on tutorial).
  4. Basic commands (input, output, data creation, exporting data) - (Hands on tutorial).
  5. Data manipulation, slicing and extracting data - (Hands on tutorial).
  6. Plotting data and graphical parameters - (Hands on tutorial).

Parallel session 8b.  Spatial and temporal data analyses in R.

This section require to have an already preexisting knowledge in R.

  1. Introduction to most relevant R Spatial - Temporal libraries - (Lecture).
  2. The R “raster” library, performance and limitations - (Round-table).
  3. Importing/Exporting geo-data - (Hands on tutorial).
  4. Model training and prediction using raster data - (Hands on tutorial).
  5. Grass and R integration - (Hands on tutorial).
  6. Multivariate analyses: auto-ecology and synecology of Swiss stone pine in a graphic (Exercise).
  7. Kappa statistics of categorical maps - (Exercise).
  8. Spatial-covariance between multiple layer maps - (Exercise).
  9. Measuring and capturing the spatial-autocorrelation: Moran’s I and semivariograms - (Exercise).

Webinar: Getting started with Sentinel Data (V. O’Brien).

Sentinel data - demo session to:

Select tile

Download images

Process data

AFTERNOON

Parallel session 9a. Command line GIS - Getting started with GRASS and Qgis (Casalegno).

This session will introduce the use of GRASS geographic information system in its command line interface for spatial-data processing, and the use of QGIS for map visualization and overlay tool. We do not expect any previous knowledge of GRASS or Qgis, but will use basic BASH and GDAL command line skills acquired in the previous days. This session is interesting for people that deal with general GIS analysis, hydrological modelling, DEM, vector/ raster data integration etc. This section is fundamental for the topic of tomorrow.

  1. Introduction to grass data structure and environment - (Lecture).
  2. GRASS and Qgis as learning tools - (Lecture).
  3. Accessing GRASS and links to Qgis - (Hands on tutorial).
  4. Command syntax and general commands of data handling - (Hands on tutorial).
  5. Grass working environment and bash working directory - (Hands on tutorial).
  6. Location and mapset - (Hands on tutorial).
  7. Region settings (Hands on tutorial).
  8. Raster and vector data import, export, display and conversion - (Hands on tutorial).
  9. Raster map calculator - (Hands on tutorial).
  10. Vector manipulation and processing - (Hands on tutorial).
  11. Production of maps and tables layout for reporting - (Hands on tutorial).

Parallel session 9b.  Advanced data processing using GRASS (Amatulli).

This parallel section assumes good understanding and discrete control of Bash command line and full knowledge of GRASS location and mapset. It is mainly indicate for people that already move in the direction of large data processing and are willing to explore more on batch job processing and multicore data manipulation for Geo-Computation. (This section can be dropped if nobody have used GRASS)

  1. GRASS70 Create Location using ancillary layer - (Hands on tutorial).
  2. Setting GRASS70 variables for GRASS bash job - (Hands on tutorial).
  3. Create a Location, enter in GRASS and import data - (Exercise).

16:00 - Webinar: Earth Observation Data and Smart analytics with python: Tom Jones (Satellite Applications Catapult, Harwell, UK)

  1. Introduction to Earth observation & state of the art commercial satellite imagery
  2. Highlighting key components for generating a satellite
  3. Derived land cover classification
    Demo :
    - Libraries:
    ARCSI, RSGISLib, sk-learn, numpy etc.
    - Key processes: atmospheric correction, segmentation, machine learning classification, accuracy assessment
    - Potential bonus: classification using a deep learning technique

DAY 5

MORNING

Session 10a. Advanced hydrological modelling with GRASS (Amatulli) (2h)

This section summarize the power of GRASS for hydrological modelling using a multicore add-on. A brief talk is presented followed by a small study case.  

  1. Near-global freshwater-specific environmental variables for  biodiversity analyses in 1km resolution  (lecture - 30 min)
  2. GRASS r.stream.watersheds & r.stream.variables add-on (Tutorial - 30 min)
  3. Calculation of contiguous stream-specific variables (Exercise - 1h)

Parallel session 10b. Species distribution modelling (Casalegno) (2h)

This section is proposed as complementary to session 9a and will be focus on one or two of the proposed study cases.

  1. Introduction to data processing for spatial based predictive models and species distribution models (SDM): Modelling and mapping the current and future Natural vegetation suitability distribution using the randomForest classifier - (Lecture).

11:00 am - Webinar: Introduction to the GWmodel R package (Dr. Paul Harris, Rothamsted Research, North Wyke, UK)
In this presentation, geographically weighted (GW) models are introduced. GW models suit situations when data are not described well by some global model, but where there are regions where a localised calibration provides a better description.: Tom Jones (Satellite Applications Catapult, harwell, UK)

AFTERNOON

Session 11. Remote sensing applications (Amatulli) (2h) TBC

  1. Introduction to image classification of remote sensed images.
  2. Remote sensing application: Burned Area Mapping using Support Vector Machine and Combined Segmentation Random Forest classifiers (Hands on tutorial) (30min)

Session 12. Geospatial data processing: find the right tool for the job (all students, Amatulli, Casalegno)

This session summarize and review  the tools presented till now, explore more open source libraries available and aims to clarify any issues or questions arising till now.

  1. Exploring more spatial data libraries: pktools, openforis, morfeo toolbox - differences and complementarities with gdal/ogr, grass and R. (Lecture 10 min)
  2. What’s best for what in choosing and using the right tool.  (Round table discussion 30 min).

Session 13. Working on students personal data.

CONCLUSION – Focus on the students' projects needs and how to get going with the use of free and open source tools for advanced spatial data processing.

Round table, question and answer session on specific data processing needs and how to follow up the summer school using open source softwares as daily working toolbox.

Social Dinner: Enjoy south italian local products in a wonderful atmosphere in the ancient “sassi”   

Spatial Ecology will conclude our full immersion week offering a very special dinner in a local restaurant, so be ready to process excellent food & wine rather than data.