The Azizi Data Management Platform (DMP)

http://azizi.ilri.org/repository

Azizi DMP.png

The use of the Open Data Kit in data collection has been growing exponentially. The process of designing forms, data collection and submission of collected data is now well established and has received a lot of support from the ODK community. While the upstream process has been great, the downstream process is still poses a challenge. Some of the common downstream challenges include:

  1. Exporting the data from ODK Aggregate format to a more relational format
  2. Merging of different versions of a form
  3. Management of pictures which were taken during data collection
  4. Extracting the data so that it can be easily used for analysis and for quick checking
  5. Data cleaning
  6. Exporting data to different formats for analysis
  7. Data sharing within the project while the project is still on-going
  8. Master reference of the data collected
  9. Simple data visualization
  10. Making the data open access

These are everyday challenges which the ODK system cannot solve and is not designed to address and require something else outside the ODK system. This is rightly so since ODK is purely a data collection system. The data management platform attempts to solve the above challenges, by creating a central point of reference for all the data collected in the project.

Exporting the data from ODK Aggregate format to a more relational format

The DMP allows the export of data from aggregate to a Ms Excel Spreadsheet or directly into a Postgres database.

These processes are meant to be truly seamless and accomplished by a few clicks of a button.

Merging of different versions of a form

In the process of data collection, at times it is inevitable to use different versions of the same form. This poses a challenge when conducting data analysis since the data will be held in different forms. It is paramount that the data in the different forms be merged and analysed as a whole. The DMP does this by first importing the different data sets from aggregate and then merging the different forms all with a few clicks of the button.

Management of pictures which were taken during data collection

In aggregate all pictures are saved as blobs in the aggregate database. Extracting a set of pictures from a data collection exercise is usually a challenge and when done on aggregate it involves navigating to the links where the pictures are stored and downloading the pictures one by one. The DMP makes it very easy to download pictures. When exporting the data, all one does is specify whether to export the pictures or not.  The pictures are automatically extracted from the database and saved in a folder. This folder is zipped and presented to the user as a link for download.

Extracting the data so that it can be easily used for analysis and for quick checking

The DMP allows data which has been collected to be exported as an excel spreadsheet. The data can be coded with the variables used when developing the collection form (for analysis) or by substituting the variables with the options that were presented to the users during collection (for easy viewing). Export for analysis allows the data to be used easily by analysis softwares while export for easy viewing allows the project coordinators have a quick overview of the data that has been collected.

Data cleaning

Data cleaning is an integral part of any data collection exercise. The DMP facilitates data cleaning where data is cleaned either on the DMP itself or by using third party softwares including R, Excel or Access

Exporting data to different formats for analysis

Most of the times, data is analysed using specialized software purely designed for analysis, eg. SPSS, STATA, R. These softwares expect data in a certain format. The DMP allows the data to be exported in various formats for analysis.

Disclaimer: Pending feature, but data can be accessed from R using a Postgres wrapper

Data sharing within the project while the project is still on-going

The DMP allows the project coordinators to share sections of the data within themselves and also externally. By granting different levels of access to different people, the DMP controls the access of the data within the project while at the same time ensuring that all the relevant people have access to the data.

Master reference of the data collected

The DMP provides a central point of reference to the clean version of the dataset. This is critical in large projects where different users are working on different datasets and different users are involved in cleaning the datasets at different stages. By providing a central point of reference, all the users in the project have access to the master cleaned dataset and any changes in the master dataset is available to the users from a central point.

Simple data visualization

The DMP envisages to have a module for data visualization where simple visualization of the data can be done.

Disclaimer: Pending functionality

Making the data open access

All data collected within ILRI must adhere to ILRI’s open access policies, which are available on, Data Portal, Open Access Documents. All data collected by ILRI is made public using the ILRI data portal which is managed by the Research Methods Group (RMG). When the project decides to make the data openly accessible, the DMP exports the cleaned datasets to the ILRI portal so that the data is made open access.

Disclaimer: Pending functionality