Spatial ecology in collaboration with the University of Basilicata DICEM (Dipartimento delle Culture Europee e del Mediterraneo).
Hands-on Open Source Drone Mapping and High Performance Computing for Big Geo-Data
Matera, Italy, 13th-17th June 2016
Alex Dumitru (Jacobs University, Bremen Germany).
Dr. Francesco Lovergine (CNR Bari, Italy).
Dr. Salvatore Manfreda (University of Basilicata Matera, Italy)
Vlad Merticariu (Jacobs University, Bremen Germany).
Prof. Pieter Bauman (Jacobs University, Germany).
This summer school is an immersion 5 day experience on advanced data processing using high performance computers (HPC) and emerging technologies such as drone mapping, rasdaman (Fastest Array Database on Earth) and cloud computing. We provide a walk-through journey from the introduction of Linux operating system and different open source software, to capturing data out in the field using an Unmanned Aerial Vehicle, to complex image processing and data analyses. We focus on on how to process data in different environments according to data types and size: maximising computation performance using multicore on single computer, switching to distributed clusters of computers (using grid engine scheduler) and ultimately data analytics with cutting edge rasdaman software (Big Data Analytics Server) and services on demand such as Amazon Web Service (cloud computing).
The summer school is aimed at students who are already using a command line approach in spatial data processing either via python, R, GRASS. We expect students to have basic knowledge of geographical data analyses and the use of Geographic Information Systems.
It is an advanced training following the fundamentals thought during the previous week (6-10 June) summer school on Spatio-Temporal data analyses using free and open source software. We recommend beginner students to follow both summer school. Students need their own laptops with a minimum of 4GB RAM and 30GB free disk space.
Registration is on a first come, first served basis and it will be closed when it reaches 25 participants. Therefore, we encourage participants to register ASAP. A waiting list will be establish in case exceeding the limit.
The summer school provides students with the opportunity to develop key skills required for advanced spatial data processing. Throughout the training students will focus on developing independent learning skills which will be fundamental for a continuous learning process for advanced data processing. This is a progressing journey of development with the availability of more complex data and the ongoing technological revolution. During this week we guide students dealing with the access of Linux OS and data processing software types, gathering field data using a drone UAV, as well as active processing data on single computers, on cloud facilities and on computer clusters using grid engine scheduler. To demonstrate cloud computing we have been granted with an “Amazon Web Service Educate” resource which will allow students to access their cloud computing session. We have also assembled a fully functional micro cluster computer “Grappolo” which is a portable and pedagogic machine replicating the operating system and functioning of a real high performance computer. Students will access “Grappolo” and learn how to process BIG spatial data using scripting routines within a grid scheduler environment. Considering open source software, we will show different types of cutting edge technologies: GRASS and R scripting, different python libraries, rasdaman (Fastest Array Database on Earth) and opendronemap (data processing toolchain for civilian unmanned aerial system image processing).
Overall, we focus our training on helping students to develop independent learning skills to find online help, solutions and strategies to fix bugs and independently progress with complex data processing problems.
The Academic Programme is divided into the following areas of study and interactions:
Field experiment: Demonstration of a flight mission for very high resolution data gathering.
Lectures: (15min to 1h each) Students will take part in a series of lectures introducing basics functioning of tools, theoretical aspects and background information, which is needed for a better understanding the profound concepts to be successively applied in advanced data processing.
Hands on Tutorials: Students will be guided during hands on session where trainers will perform data analyses on real case study datasets (including data gathered in the field with drone) and students will follow the same procedure using their laptops. During tutorials session students are supported by two trainers, one for the demonstrations and one to supervise the students' work as well as helping with individual guidance on coding.
Exercise: In addition to tutorial and lectures, students are encouraged to take up their own independent study during exercise sessions. Specific tasks will be set allowing to reinforce the newly learned data processing capacity presented in lectures and practically learned during the tutorial sessions. Such exercise sessions equip students with the confidence and skills to become independent learners and effectively engage with the demands of advanced spatial-data processing.
Round table discussions: these sessions are mainly focused on exchanging experiences, needs and point of views. We aim at clarify specific student’s needs and challenges, focus on how to help and how to find solutions while problem solving.
Our summer school will enable students to further develop and enhance their spatio-temporal data processing skills. We will focus on the understanding of dataflows, bottlenecks expanding processing time and how to maximize the use of single laptops using multi cores to distributed computing systems (HPC clusters, cloud computing).
Additionally, our open source approach will allow participants to start using professionally a fully functional operating system and software. With continuous practise during the week students will improve their command line approach and will focus on developing specific areas, including:
Summer school certification
At the end of the summer school the attendees will receive a course certificate upon successful completion of the course, although it is up to the participant’s university to recognize this as official course credit.
Time table: (7h teaching/day)
DAY 1 Monday, June 13th
Getting started: Knowing each other and LINUX OS (Amatulli, Casalegno; 9:00 - 13:00)
This session introduce the overall course program. We also learn how to install and use the open source virtual environment operating system and softwares which will be used during the summer school.
Introduction to UAV drone mapping for environmental monitoring, ecology and precision farming (Lecture - Dr. Salvatore Manfredi; 14:00 - 14:30 )
Open source toolchain for drone mapping: case study with open hardware quadcopter, ndvi camera and open source data processing toolchain (Casalegno; 14:30 - 17:00)
Field mapping experiment (Casalegno, Manfreda, 17:00 - 18:00 pm)
High resolution data gathering with NDVI sensor - UAV quadcopter flight mission.
Open Drone Map
Open Source Toolkit for processing Civilian Drone Imagery (webinar Stephen Mather. 18:00 - 18:45)
DAY 2 Tuesday, June 14th
High resolution mapping from UAV data: opendronemap processing toolchain (Casalegno - 9:00 : 10:00 hour)
Fly Elephant: Your Home for High Performance Computing (webinar: Dmitry Spodarets: 10:00 - 10:30)
Drone image processing on web based cloud infrastructure.
Multicore and cloud processing (Amatulli).
In this session the concept of computer performance and architectures will be introduced. Once we have created our fantastic scripting routine, sometimes our computer takes ages to process data! We are just waiting hours and days! In this session we try to explain how to maximize computational implementation and process data more efficiently. It aims to teach grid computing in local and remote servers.
Language integration (Amatulli).
In this session a basic understanding of different programming languages is expected to know. Here we focus on exchange variables and settings between languages under the same workflows, interesting for people who already move in the direction of large data processing and are willing to learn how to maximize the processing chain. We will use Bash as main language and we will call and integrate R, GRASS, gdal, pktools.
Open Aereal Map: The open collection of aerial imagery (webinar: Nate Smith developement seeds - 16:00 17:00)
DAY 3 Wednesday, June 15th
Spatial data processing with python for drone mapping (Lovergine)
Only few of this topics will be shown
Rasdaman: the world's most flexible and scalable Array Engine (webinar: Prof. P. Bauman - TBD)
More spatial data processing with python (F. Lovergine)
Only few of this topics will be shown
Hands on tutorial/exercises on relevant toy problems:
Roundtable question and answers session on geo-data process: current applications, limits perspectives. (Amatulli, Casalegno, Loverginel)
DAY 4 Thursday, June 16th
Rasdaman: Presenting the core concepts and technology (Merticariu & Dumitru ):
In geosciences, and especially in the fields of remote sensing and geomatics, frequently large amounts of raster data need to be stored and processed efficiently. Rasdaman is tackling the big data deluge by providing a scalable array database that is capable of storing complex geographic data structures and exposing them through open and standard web services. Actually, the rasdaman team is actively shaping the OGC Big Geo Data standards.
Rasdaman is the OGC Web Coverage Service (WCS) Core Reference Implementation, but also supports and Web Mapping Services (WMS) and Web Processing Services (WPS), for example. A particularly exciting extension of the WCS service is the Processing Extension. This links in the Web Coverage Processing Service (WCPS) which allows users to exploit the flexibility of a fully fledged query language for coverages to request ad-hoc parallel processing directly on the server, minimizing data transfer and response times.
Hands-on exercises, experimenting with real data using the above mentioned services. Workshop participants will:
DAY 5 Friday, June 17th
Working in tiles for image processing (Amatulli).
In this section we explain how tiling several images, compute an operation in multicore, and remerge the back the tiles.
Introducing cluster computers (ESI Beowulf Cluster: Carson, webinar: A. Cowley 10:00 - 11:00)
Batch processing data using GRASS geographic information system (Amatulli).
(This section can be dropped/replaced if nobody have interests in GRASS)
In this section we explain how to maximize the use of GRASS and explore the huge potential of splitting large data processing routines into multiple smaller tasks performed on multiple CPU or computer clusters.
Hands on a micro cluster computer (Casalegno)
Hands on data processing with Grid Engine scheduler (Casalegno)
CONCLUSION - Students' needs and how to get going with the use of drones, HPC clusters and open source technology in the “new age” of data processing. (1h)
Round table, question and answer session on specific data processing needs of students and which open source tools-technologies are more appropriate to use.
Social Dinner: Enjoy south italian local products in a wonderful atmosphere in the ancient “sassi”:
We will conclude our full immersion week with a dinner in a local restaurant, so be ready to process excellent food & wine rather than data.