logo_spatial_ecology.png         dicem_small.png       unibas_logo.jpeg

Spatial ecology in collaboration with the University of Basilicata DICEM (Dipartimento delle Culture Europee e del Mediterraneo).

Summer school:

Hands-on Open Source Drone Mapping and High Performance Computing for Big Geo-Data

Matera, Italy,  13th-17th June 2016

Trainers:

Dr. Giuseppe Amatulli, (Yale University, USA, Spatial ecology ).

Dr. Stefano Casalegno, (University of Exeter, UK ; Spatial ecology ).

Alex Dumitru (Jacobs University, Bremen Germany).

Dr. Francesco Lovergine (CNR Bari, Italy).

Dr. Salvatore Manfreda (University of Basilicata Matera, Italy)

Vlad Merticariu (Jacobs University, Bremen Germany).

Remote conferences:

Prof. Pieter Bauman (Jacobs University, Germany).
Dr. Andrew Cowley (
University of Exeter, UK)

Stephen Mather (OpenDroneMap).
Nate Smith (
Development Seed – Humanitarian Open Street Map Team ).
Dmitry Spodarets –  (
FlyElephant)

This summer school is an immersion 5 day experience on advanced data processing using high performance computers (HPC) and emerging technologies such as drone mapping, rasdaman (Fastest Array Database on Earth) and cloud computing. We provide a walk-through journey from the introduction of Linux operating system and different open source software, to capturing data out in the field using an Unmanned Aerial Vehicle, to complex image processing and data analyses. We focus on on how to process data in different environments according to data types and size: maximising computation performance using multicore on single computer, switching to distributed clusters of computers (using grid engine scheduler) and ultimately data analytics with cutting edge rasdaman software (Big Data Analytics Server) and services on demand such as Amazon Web Service (cloud computing).

Course requirements:

The summer school is aimed at students who are already using a command line approach in spatial data processing either via python, R, GRASS. We expect students to have basic knowledge of geographical data analyses and the use of Geographic Information Systems.

It is an advanced training following the fundamentals thought during the previous week (6-10 June) summer school on Spatio-Temporal data analyses using free and open source software. We recommend beginner students to follow both summer school. Students need their own laptops with a minimum of 4GB RAM and 30GB free disk space.

Registration:

Registration is on a first come, first served basis and it will be closed when it reaches 25 participants. Therefore, we encourage participants to register ASAP. A waiting list will be establish in case exceeding the limit.

Academic programme:

The summer school provides students with the opportunity to develop key skills required for advanced spatial data processing. Throughout the training students will focus on developing independent learning skills which will be fundamental for a continuous learning process for advanced data processing. This is a progressing journey of development with the availability of more complex data and the ongoing technological revolution. During this week we guide students dealing with the access of Linux OS and data processing software types, gathering field data using a drone UAV, as well as active processing data on single computers, on cloud facilities and on computer clusters using grid engine scheduler. To demonstrate cloud computing we have been granted with an “Amazon Web Service Educate” resource which will allow students to access their cloud computing session. We have also assembled a fully functional micro cluster computer “Grappolowhich is a portable and pedagogic machine replicating the operating system and functioning of a real high performance computer. Students will access “Grappolo” and learn how to process BIG spatial data using scripting routines within a grid scheduler environment. Considering open source software, we will show different types of cutting edge technologies: GRASS and R scripting, different python libraries, rasdaman (Fastest Array Database on Earth) and opendronemap (data processing toolchain for civilian unmanned aerial system image processing).

Overall, we focus our training on helping students to develop independent learning skills to find online help, solutions and strategies to fix bugs and independently progress with complex data processing problems.

The Academic Programme is divided into the following areas of study and interactions:

Field experiment: Demonstration of a flight mission for very high resolution data gathering.

Lectures: (15min to 1h each) Students will take part in a series of lectures introducing basics functioning of tools, theoretical aspects and background information, which is needed for a better understanding the profound concepts to be successively applied in advanced data processing.

Hands on Tutorials: Students will be guided during hands on session where trainers will perform data analyses on real case study datasets (including data gathered in the field with drone) and students will follow the same procedure using their laptops. During tutorials session students are supported by two trainers, one for the demonstrations and one to supervise the students' work as well as helping with individual guidance on coding.

Exercise: In addition to tutorial and lectures, students are encouraged to take up their own independent study during exercise sessions. Specific tasks will be set allowing to reinforce the newly learned data processing capacity presented in lectures and practically learned during the tutorial sessions. Such exercise sessions equip students with the confidence and skills to become independent learners and effectively engage with the demands of advanced spatial-data processing.

Round table discussions: these sessions are mainly focused on exchanging experiences, needs and point of views. We aim at clarify specific student’s needs and challenges, focus on how to help and how to find solutions while problem solving.

Learning objectives

Our summer school will enable students to further develop and enhance their spatio-temporal data processing skills. We will focus on the understanding of dataflows, bottlenecks expanding processing time and how to maximize the use of single laptops using multi cores to distributed computing systems (HPC clusters, cloud computing).

Additionally, our open source approach will allow participants to start using professionally a fully functional operating system and software. With continuous practise during the week students will improve their command line approach and will focus on developing specific areas, including:

  • Developing a broad knowledge of existing solutions for Big Data processing and be able to judge the most appropriate for their needs.         
  • Building confidence with the use of open source spatial data software and languages (python, grass, R, rasdaman, opendronemap) and with Linux operating system.
  • Developing data processing skills on cluster computers, cloud processing         services on demand and the use of grid scheduler; knowing more on data type, data modelling and data processing  techniques.
  • Encouraging independent learning, critical thinking and effective data processing.

Summer school certification

At the end of the summer school the attendees will receive a course certificate upon successful completion of the course, although it is up to the participant’s university to recognize this as official course credit.

Time table: (7h teaching/day)

9:00 - 10:45

Morning session 1 (1h45min)

10:45 - 11:05

Coffee break

11:05 - 12:50

Morning session 2  (1h45min)

12:50 - 14:00

Lunch

14:00 - 15:45

Afternoon session 1  (1h45min)

15:45 - 16:00

Break

16:00 - 17:45

Afternoon session 2  (1h45min)

Summary

Day 1:        

LINUX and OSGeo-live operating system

Open source Drone mapping from A to Z

Day2:

High resolution mapping from drone/UAV data

Spatial data processing with python

Day 3:

Optimizing data processing with multicore and cloud processing

Day 4:

Big spatial data processing: Rasdaman fastest data array on earth

Day 5:

Cluster data processing and grid engines

DAY 1 Monday, June 13th

MORNING

Getting started: Knowing each other and LINUX OS (Amatulli, Casalegno; 9:00 - 13:00)

This session introduce the overall course program. We also learn how to install and use the open source virtual environment operating system and softwares which will be used during the summer school.

  1. Knowing each other: trainers and participants - Identifying participant expectations (Round-table)
  1. Course         objectives and schedule.         
  2. Linux environment, and command line software and libraries for geocomputation: Grass R, Gdal/Ogr, opendronemap  (Lecture)         
  3. Installation and introduction to the Linux Virtual Machine (Hands-on tutorial)         
  4. www.spatial-ecology.net platform (Lecture)
  5. Lubuntu GUI and Unix/Linux command line (Hands-on tutorial)         
  6. The use of gedit as an editor (Hands-on tutorial)
  7. A basic scripting in R, Grass and Gdal/Ogr (Exerciese).
  1. Create a scripting procedure that download the Modis Land Surface Temperature hdf file for day 001 year 2008 ftp://ladsweb.nascom.nasa.gov/allData/6/MOD11A2/2008/001/
  2. Create a composite using  MODIS_Grid_8Day_1km_LST:LST_Day_1km band

AFTERNOON

Introduction to UAV drone mapping for environmental monitoring, ecology and precision farming (Lecture - Dr. Salvatore Manfredi; 14:00 - 14:30 )

  1. General Introduction  - potential of drone mapping  (Lecture)
  2. Some example UAS applications  (Lecture)
  3. Monitoring river basin morphology (Lecture)

Open source toolchain for drone mapping: case study with open hardware quadcopter, ndvi camera and open source data processing toolchain (Casalegno; 14:30 - 17:00)

  1. UAVs based photogrametry for 3D model reconstructions and orthomosaic. (Lecture).
  2. Basic functioning of the 3DR robotics IRIS+ quadcopter. (Hands on demonstration).
  3. Sensor: hacking IXUS 400 16Mpixel Canon camera to simulate an “NDVI” signal.
  4. Customizing the IRIS+ to auto trigger the camera. (Hands on demonstration).
  5. CHDK, STICK software and scripting requirements for automatic data capture.
  6. Getting started with Mission planner on the OSGeo-live virtual machine. (Hands on tutorial).
  7. Parametrization settings for mapping. (Hands on tutorial).
  8. Define mapping area, image requirements and compilation of flight mission to the IRIS. (Hands on tutorial).

Field mapping experiment (Casalegno, Manfreda, 17:00 - 18:00 pm)

High resolution data gathering with NDVI sensor - UAV quadcopter flight mission.

  1. Setting ground georeference points.
  2. Setting reflectance reference colour targets for later image parametrisation.
  3. Double check hardware functioning.
  4. UAV flight mission.

Open Drone Map

Open Source Toolkit for processing Civilian Drone Imagery (webinar Stephen Mather. 18:00 - 18:45)

DAY 2 Tuesday, June 14th

MORNING

High resolution mapping from UAV data: opendronemap processing toolchain (Casalegno - 9:00 : 10:00  hour)

  1. Downloading (from local ftp) data and log files from pixhawk and geotag images with Lat. Long. coordinates. (Hands on tutorial).
  2. Data processing toolchain with OpendroneMap (Lecture). Point Clouds - Digital surface Models - Textured Digital Surface Models - Orthorectified Imagery - Classified Point Clouds - Digital Elevation Models.
  3. Data processing tool-chain with OpendroneMap (Hands on tutorial).

Fly Elephant: Your Home for High Performance Computing (webinar: Dmitry Spodarets:  10:00 - 10:30)

Drone image processing on web based cloud infrastructure.  

Multicore and cloud processing (Amatulli).

In this session the concept of computer performance and architectures will be introduced. Once we have created our fantastic scripting routine, sometimes our computer takes ages to process data! We are just waiting hours and days! In this session we try to explain how to maximize computational implementation and process data more efficiently. It aims to teach grid computing in local and remote servers.

  1. Understanding PC performance (lecture).
  2. Set up your VM for multicore computation  (Hands on tutorial).
  3. Multicore and parallel processing on local Machine  (Hands on tutorial).        
  4. Multicore computing on a laptop (xargs) (Exercise).
  5. Cloud computing in Amazon Web Services (AWS) (Hands on tutorial).
  1. Enter in Amazon Ubuntu Machine 
  2. Transfer data to AWS
  1. Multicore computation in AWS with XARGS 

AFTERNOON

Language integration (Amatulli).

In this session a basic understanding of different programming languages is expected to know. Here we focus on exchange variables and settings between languages under the same workflows, interesting for people who already move in the direction of large data processing and are willing to learn how to maximize the processing chain. We will use Bash as main language and we will call and integrate R, GRASS, gdal, pktools.

  1. Combine multiple languages in a scripting-process (Lecture).
  2. Import and export variables from different languages
  3. The EOF syntax to integrate language (Tutorial)
  4. Integrate BASH and R (Tutorial and Exercise)
  5. Combine pktools/gdal/oft-tools and R (Tutorial and Exercise)
  6. Full toolchain of remote sensing data analyses: Downloading, re-project,         clipping, merging and data analyses - Examples.

Open Aereal Map: The open collection of aerial imagery (webinar: Nate Smith developement seeds - 16:00 17:00)

DAY 3 Wednesday, June 15th

MORNING

Spatial data processing with python for drone mapping (Lovergine)

Only few of this topics will be shown 

  1. Python         and how to think the Python way (Short Lecture).
  2. Getting         started for Python and geospatial applications. (Hands on tutorial)
  3. Useful modules (Hands on tutorial and exercises):
  4. urlib and minidom
  5. ElementTree
  6. WKT
  7. JSON modules
  8. Shapely, PyShp, OGR and friends
  9. PIL (Python Imaging Library) and NumPy (Numerical Python Library).
  10. Pandas (Python Data Analysis Library) and Scipy (Scientific libraries for Python).        

AFTERNOON

Rasdaman: the world's most flexible and scalable Array Engine (webinar: Prof. P. Bauman - TBD)

More spatial data processing with python (F. Lovergine)

Only few of this topics will be shown

Hands on tutorial/exercises on relevant toy problems:

  1. Representing and storing geospatial data
  2. Measuring distances
  3. Coordinate conversion, Reprojection
  4. Editing shapefiles and performing selections
  5. Dot density calculation
  6. Choropleth maps
  7. Using spreadsheets
  8. Using GPS data
  9. Python         and remote sensing
  10. Swapping image bands
  11. Creating histograms
  12. Clipping images
  13. Classifying images
  14. Extracting features from images
  15. Simple change detection
  16. Python         and elevation data
  17. ASCII grid files
  18. Creating a shaded relief
  19. Creating elevation contours
  20. Working with LIDAR
  21. Python         and FOSS spatially-enabled databases
  22. MySQL, PostGIS and Spatialite at Python level

Roundtable question and answers session on geo-data process: current applications, limits perspectives. (Amatulli, Casalegno, Loverginel)

DAY 4 Thursday, June 16th

MORNING

Rasdaman: Presenting the core concepts and technology (Merticariu & Dumitru ): 

In geosciences, and especially in the fields of remote sensing and geomatics, frequently large amounts of raster data need to be stored and processed efficiently. Rasdaman is tackling the big data deluge by providing a scalable array database that is capable of storing complex geographic data structures and exposing them through open and standard web services. Actually, the rasdaman team is actively shaping the OGC Big Geo Data standards.

Rasdaman is the OGC Web Coverage Service (WCS) Core Reference Implementation, but also supports and Web Mapping Services (WMS) and Web Processing Services (WPS), for example. A particularly exciting extension of the WCS service is the Processing Extension. This links in the Web Coverage Processing Service (WCPS) which allows users to exploit the flexibility of a fully fledged query language for coverages to request ad-hoc parallel processing directly on the server, minimizing data transfer and response times.

  1. The rasdaman array model and its query language
  2. The CIS coverage model
  3. The Web Coverage Service and its extensions

AFTERNOON

Hands on Rasdaman

Hands-on exercises, experimenting with real data using the above mentioned services. Workshop participants will:

  1. Deploy rasdaman and its required components to create a local service
  2. Learn how to ingest data into rasdaman and how to expose it as a coverage. Three types of coverages will be handled (2D rectified imag, 3D regular time series of satellite imagery,3D irregular time series of satellite imagery)
  3. Learn how to use the WCS service to access coverages
  4. Explore the extensions of the WCS service and the functionality that they provide
  5. Ad-hoc process the coverages using WCPS queries

DAY 5 Friday, June 17th

MORNING

Working in tiles for image processing (Amatulli).

In this section we explain how tiling several images, compute an operation in multicore,  and remerge the back the tiles.

  1. The use of .VRT for splitting and merging images. 
  2. Working in tiles with gdal and pktools in multicore environment.

Introducing cluster computers (ESI Beowulf Cluster: Carson, webinar: A. Cowley 10:00 - 11:00)

Batch processing data using GRASS geographic information system (Amatulli).

(This section can be dropped/replaced if nobody have interests in GRASS)

In this section we explain how to maximize the use of GRASS and explore the huge potential of splitting large data processing routines into multiple smaller tasks performed on multiple CPU or computer clusters.

  1. Multicore Grass geodata processing.
  2. GRASS70 Create Location using ancillary layer
  3. Setting         GRASS70 variables for batch job processing (in Bash or Python         scripting)
  4. Exercise Create a Location, enter in GRASS and import data

AFTERNOON

Hands on a micro cluster computer (Casalegno)

  1. Introduction to the portable and low cost micro-cluster : grappolo (lecture).
  2. Grid engine scheduler in grappolo (Hands on tutorial).
  3. Build your own cluster (Lecture).
  4. Hands on grappolo:  log in, exploring the OS and grid scheduler main commands (hands on tutorial).

Hands on data processing with Grid Engine scheduler (Casalegno)

  1. GRASS in grappolo - (Exercise).
  2. R in grappolo - (Exercise).
  3. Gdal in grappolo - (Exercise).
  4. Grid engine arrays - (Exercise).

CONCLUSION - Students' needs and how to get going with the use of drones, HPC clusters and open source technology in the “new age” of data processing. (1h)

Round table, question and answer session on specific data processing needs of students and which open source tools-technologies are more appropriate to use.

Social Dinner: Enjoy south italian local products in a wonderful atmosphere in the ancient “sassi”:

We will conclude our full immersion week with a dinner in a local restaurant, so be ready to process excellent food & wine rather than data.