Distributed Data Grids & Federation: The CyVerse Data Store
How does CyVerse work?
Science use cases using CyVerse
22.1 m
394 m
2,000 kg
480 V 3-phase AC
30 ton
Photo: Jesse Rieser for The Wall Street Journal
Processing a whole lot of data
RGB
~550GB
Thermal
~5.5GB
Fluorescence
~80GB
3D
~300GB
Hyperspectral
~600GB
Plant Phenotyping Data transfer and computation
(raw + processed)
DataStore
Cache Server
UA HPC
Algorithm
Collaborators and Public
Event Horizon Telescope
(raw + processed)
DataStore
Cache Server
Open Science Grid
Algorithm
Collaborators and Public
From Citizen-Science to Your Phone: Insect Classification and Detection Using Self-Supervised Learning Methods
Zi Deng
PhD Student
Electrical Engineering
University of Arizona
Data Science Institute
Shivani Chiranjeevi
PhD Student
Mechanical Engineering
Iowa State University
AIIRA
Arti Singh
Assistant Professor
Department of Agronomy
Iowa State University
Goal: Mobile Insect Identification App
AIIRA
Mobile web app to accurately identify 142 agriculturally important insect-pest species (extended to 2526 species)
Key Features:
iNaturalist Open Dataset
AIIRA
Metadata Columns
http://inaturalist-open-data.s3.amazonaws.com/photos/[photo_idl]/[size].jpg
Original | Large | Medium | Small | Thumb | Square |
2048px | 1024px | 500px | 240px | 100px | 75px x 75px |
Data Extraction for Classification
AIIRA
Challenges:
Insecta Dataset
AIIRA
AIIRA
Scaling and Parallelization
AIIRA
Dataflow
Repeat for each species
AIIRA
Additional Features
A sunburst plot is a data visualization technique that displays hierarchical data in a radial layout.
iNaturalist Insect Top 100 Species interactive visualization
iNaturalist is constantly updated. iNatSD includes update feature to repeat downloads.
What is the community building ?
New, improved workflow using ML
Train machine learning model to detect leaves
Annotate images
Build Streamlit app
(Hosted on CyVerse as VICE app)
Researcher uploads new data to CyVerse
Researcher can use Streamlit app to run ML model on new data
Add unique QR codes for each plant
QR codes
Annotate images
https://labelstud.io
Machine Learning Model
Final output is .pth file
(pytorch ML model weights)
So we have a model.. now what?
New, improved workflow using ML
Train machine learning model to detect leaves
Annotate images
Build Streamlit app
(CyVerse VICE app)
Researcher uploads new data to CyVerse
Researcher use app in DE to run ML model on new data
Add unique QR codes for each plant
Model weights
Where to host your app?