dDROID Workshop - Core Document

Table of Contents

Overview

Shortened Document Link, Handout Link & Remote Connectivity

Twitter Hashtag

Lightning Round (Notes)

Training session: Workflow Core Concepts (Notes)

Presentation: Workflow Elements and Concepts - Common Practices (Notes)

Presentation: Social Issues in Collaborative Digitization (Notes)

Day 1 Breakout Group Documentation

Breakout Group A - Specimens on flat sheets/in packets

~ 20 minutes: Current Workflow Constraints

~ 10 minutes: Workflow Processes Across Institutions

~ 30 minutes: Metrics & How to Measure Success

~ 45 minutes: Workflow evaluation matrix

Breakout Group B - Specimens pinned in trays

~ 20 minutes: Current Workflow Constraints

~ 10 minutes: Workflow Processes Across Institutions

~ 30 minutes: Metrics & How to Measure Success

~ 45 minutes: Workflow evaluation matrix

Breakout Group C - Three-dimensional specimens in boxes/drawers & jars

~ 20 minutes: Current Workflow Constraints

~ 10 minutes: Workflow Processes Across Institutions

~ 30 minutes: Metrics & How to Measure Success

~ 45 minutes: Workflow evaluation matrix

Breakout Group reports to the re-assembled Plenary Group (Notes)

Pre-Workshop Survey results and discussion (Notes)

Training session: Business Process Modeling (Notes)

Day 2 Breakout Group Documentation - Workflows

Breakout Group A - Specimens on flat sheets/in packets

Breakout Group B - Specimens pinned in trays

Breakout Group C - Three-dimensional specimens in boxes/drawers & specimens in spirits in jars

Plenary: reports back from the Breakout Groups and discussion (Notes)

Vision for the Future / Minority Reports / Out-of-the-box ideas (Notes)

Plenary wrap-up discussion. DROID Working Group strategy for polishing and and dissemination of workshop products. (Notes)

Word Bank for Common Terms

Workshop Agenda

Software Tools

Overview

Thank you for participating in the development and documentation of improved biodiversity digitization workflows in the DROID (Developing Robust Object-to-Image-to-Data) Workshop. This workshop is expected to generate a number of paper as well as digital artifacts. In order to facilitate the consolidation and publication of data, as well as to encourage community contribution, we ask that all digital data be stored directly within this single core workshop document. Modification and edits from all participants are encouraged throughout the Workshop. Please ensure that any paper artifacts are turned in to a Workshop staff member prior to the close of the Workshop (be sure to include your name, Working Group session, and other pertinent information on the paper artifact to help identify the content).

Shortened Document Link, Handout Link & Remote Connectivity

URL to this online Google Document - http://tinyurl.com/d2hxs8z

URL to DROID Handouts in Google Documents - http://tinyurl.com/7ejqvvm

URL to the DROID Adobe Connect site - https://idigbio.adobeconnect.com/droid 

Twitter Hashtag

#idigbio

Lightning Round (Notes)

Presentation Order:

Several institutions are using open-source software, however most are also using (or augmenting other software) with ad-hoc internally-written software. This software is not shared with other institutions, or even other collections within the same institution, due to concerns with the quality of the software (not comfortable that the quality is sufficient for sharing), lack of documentation, and concerns about needing to support the software as other collections/institutions use the software and have issues or question.

There is no current consensus on a workflow documentation protocol/software. The workshop will focus on the processes first, and then take a closer look at workflow documentation protocol/software selection.

Leveraging other already-databased collection events helps to pre-fill data for newly-databased specimens from the same event. However, this is primarily only taken advantage of within collections databases. Combining and searching data from all collections/institutions would prove even more helpful... a role for Scatter/Gather/Reconcile (SGR), however SGR is currently only populated with herbarium data.

Training session: Workflow Core Concepts (Notes)

“If you want to succeed, double your failure rate!  Fail in new, useful, and educational ways.”

Presentation: Workflow Elements and Concepts - Common Practices (Notes)

Presentation: Social Issues in Collaborative Digitization (Notes)

Day 1 Breakout Group Documentation

Breakout groups are preconceived groupings based upon preservation type. These groups are for consideration only and may be modified based upon feedback from participants.

Breakout Group A - Specimens on flat sheets/in packets

http://idigbio.adobeconnect.com/droid1/

Members of Group A: Dorothy Allard, Les Landrum, Melissa Tulig, Michael Bevans, Ed Gilbert, Jason Best, Rusty Russell, Austin Mast (Moderators: Larry Page, Chris Norris, Jason Grabon)

~ 20 minutes: Current Workflow Constraints

Current workflow constraints and proposed solutions (process, technology, staffing, funding, institutional culture/psychology, more...)

~ 10 minutes: Workflow Processes Across Institutions

Elements that would cause a workflow to diverge from one institution to the next (volunteerism, level of professional expertise within the digitization process, funds, more...)

~ 30 minutes: Metrics & How to Measure Success

What defines a “successfully digitized object” (the outcome of an optimal workflow, including databasing, geo-referencing, etc) and measurements of success (cost per specimen, throughput per hour, minimization of level of knowledge required to fully digitize an object via process and tools, queue times, more...)

~ 45 minutes: Workflow evaluation matrix

As the biodiversity collections community moves forward with digitization efforts, we need strategies not only for documenting workflows, but also systematic methods for evaluating workflows to look for ways to increase efficiency. Some synonyms for efficiency include: effectiveness, efficaciousness, productiveness. While speeding up and automating processes certainly improves efficiency, there are other related factors to consider that, if optimized, can minimize damage to specimens, influence data quality, and increase worker satisfaction.

With this in mind, please consider the matrices below as a starting point to develop a methodical way to try and find various points in our workflows where productiveness might be increased. We look forward to your input on these forms and tweaking them to add value.

Look for opportunities to increase workflow efficiency in a systematic manner. How might one increase efficiency?

 

It is our plan to utilize the data captured in these forms to compile lists of needs for the community in each area (e.g. software development, sharing existing physical tools, a list of steps that can be done with citizen scientists, ...)

At the end of this there is a sample set with comments to show how these documents may help the community coalesce these ideas.

Pre-Digitization Curation

Tasks

must be done before digitization

could be done at or after digitization

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

identify specimens to be digitized

 

 

 

 

 

 

 

 

 

 

identify location of specimen

 

 

 

 

 

 

 

 

 

 

 

remove specimen from collection

 

 

 

 

 

 

 

 

 

 

 

document/flag location to enable return of the specimen

 

 

 

 

 

 

 

 

 

 

 

 apply barcode

 

 

 

 

 

 

 

 

 

 

 

hiring and training staff

 

 

 

 

 

 

 

 

 

 

 

 conservation and collection

 

 

 

 

 

 

 

 

 

 

 

 complete a project management plan

 

 

 

 

 

 

 

 

 

 

 

 specimen repair

 

 

 

 

 

 

 

 

 

 

 

select/purchase hardware

select/install/configure software

identify authority files

configure imaging station with a set scale and color chart

 

Imaging

Specimen Tasks

(label may be with  specimen)

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 place scale and color bar in the imaging frame

 

 

 

 

 

 

 

 

 

 

calibrate camera to balance exposure and white balance based upon the color chart

 

 

 

 

 

 

 

 

 

 

 

 photograph the herbarium sheet

 

 

 

 

 

 

 

 

 

 

 

 select specimens with key features for close-up images, and image those specimens

 

 

 

 

 

 

 

 

 

 

 

 optional: rename file

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Post Image Capture Image

Processing Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 save archival copy

 

 

 

 

 

 

 

 

 

 

 optional: rename file

 

 

 

 

 

 

 

 

 

 

 

 create a web-presentation file

 

 

 

 

 

 

 

 

 

 

 

 add metadata (TBD, including copyright, photographer, type of photo, etc)

 

 

 

 

 

 

 

 

 

 

 

 apply color adjustment (controversial)

 

 

 

 

 

 

 

 

 

 

 

 optional: redact locality information for sensitive specimens

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Capture Specimen Data from Image (Or Specimen Label) Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 access queued images requiring data capture

 

 

 

 

 

 

 

 

 

 

 database utilizing voice recognition

 

 

 

 

 

 

 

 

 

 

 

 OCR

 

 

 

 

 

 

 

 

 

 

 

 NLP

 

 

 

 

 

 

 

 

 

 

 

 validate OCR results

 

 

 

 

 

 

 

 

 

 

 

correct OCR errors

execute NLP

 keystroking (internal project team)

 

 

 

 

 

 

 

 

 

 

 

 crowdsourced keystroking

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Post Specimen Data Capture Quality Analysis / Quality Control Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

validate country, state and county against authority files

 

 

 

 

 

 

 

 

 

 

programatically validate lat/long coordinates

 

 

 

 

 

 

 

 

 

 

 

validate taxonomy against authority file

 

 

 

 

 

 

 

 

 

 

 

**A common QR tool would be extremely helpful for the community

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Georeferencing Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Comments:

Sample Pre Digitization Curation Tasks:

Specimen Accession, Specimen Cataloging, Interview Staff, Hire Staff, Train Staff, Decide What to Digitize, Pull Specimens, Sort Specimens (e.g., by Taxon, Sex, Geographic Region, Collecting Event, Collector, Color, Size, Shape), Add Taxon Names to Database, Update Taxonomic Identification on Specimens (e.g., vet type specimens)

Sample Imaging Tasks:

Affix Barcode, Turn on Camera, Check Camera Settings, Check Lighting, Order Specimens, Take Photos, Stamp Specimen as “Imaged”, Return Specimen to Collection

 

Sample Post Image Capture Image Processing Tasks:

Name images, Rename Images, Store Original, Crop, Make Derivatives, Color Correction

Sample Capture Specimen Data from Image (Or Specimen Label) Tasks:

Turn on Computer, Log In (Remote or on Site), Open Image, Enter Taxon Data, Enter Locality Data, Enter Specimen Record (All Data), Enter Only Minimal Fields, Built in Quality Control Steps In Situ

Sample Post Specimen Data Capture Quality Analysis / Quality Control Tasks:

Turn on Computer, Log In (Remote or on Site), Automated QA/QC – Taxon Names; Collector Names; Place Names; County-State Validation

Sample Georeferencing Tasks:

Turn on Computer, Log In (Remote or on Site), One Record At A Time, Batch Georef Processing

Breakout Group B - Specimens pinned in trays

http://idigbio.adobeconnect.com/droid2/

REVISION HISTORY For this Group B section only.  Discussion 5/30/12: 400-5:30 PM, Deb Paul made initial notes below as group scribe, Deb Paul updated notes late P.M. 5/30 & early A.M. 5/31.  Deb created an associated Word document for easier editing, the table in the fourth item below was edited in the Word Doc not here for ease of use.  This doc Jim B edited 5/31/12 at 5:30 AM.

Members of Group B: Jennifer Thomas, Paul Heinrich, Paul Morris, Petra Sierwald, Dmitry Dmitriev and Moderators: Jim Beach & Deb Paul, Stan Blum and 1-2 others online

~ 20 minutes: Current Workflow Constraints

Current workflow constraints and proposed solutions (process, technology, staffing, funding, institutional culture/psychology, more...)

~ 10 minutes: Workflow Processes Across Institutions

Elements that would cause a workflow to diverge from one institution to the next (volunteerism, level of professional expertise within the digitization process, funds, more...)

~ 30 minutes: Metrics & How to Measure Success

What defines a “successfully digitized object” (the outcome of an optimal workflow, including databasing, geo-referencing, etc) and measurements of success (cost per specimen, throughput per hour, minimization of level of knowledge required to fully digitize an object via process and tools, queue times, more...)

From Stan Blum (CAS) online:  “Success” can be understood as a set of capabilities:  

Cost issues:

Stan Blum, metrics different from project to project. Can we break it down?

 

Regarding “bad data records?”

~ 45 minutes: Workflow evaluation matrix

As the biodiversity collections community moves forward with digitization efforts, we need strategies not only for documenting workflows, but also systematic methods for evaluating workflows to look for ways to increase efficiency. Some synonyms for efficiency include: effectiveness, efficaciousness, productiveness. While speeding up and automating processes certainly improves efficiency, there are other related factors to consider that, if optimized, can minimize damage to specimens, influence data quality, and increase worker satisfaction.

With this in mind, please consider the matrices below as a starting point to develop a methodical way to try and find various points in our workflows where productiveness might be increased. We look forward to your input on these forms and tweaking them to add value.

Look for opportunities to increase workflow efficiency in a systematic manner. How might one increase efficiency?

 

It is our plan to utilize the data captured in these forms to compile lists of needs for the community in each area (e.g. software development, sharing existing physical tools, a list of steps that can be done with citizen scientists, ...)

At the end of this there is a sample set with comments to show how these documents may help the community coalesce these ideas.

Pre-Digitization Curation

Tasks

must be done before digitization

could be done at or after digitization

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 define in scope in proposal

 

 

 

 

 

 

 

 

 

 

UGs create a provisional taxon authority file fam by fam by going into collection (2 days) open cabinet, remove 5 drawers type in taxa, put trays back

 

 

 

 

 

 

 

 

 

 

 

That list of names goes to an inhouse or external taxon expert for validation returned marked up with taxon placement changes.

 

 

 

 

 

 

 

 

 

 

 

 specimens are relocated based on taxon changes, unit trays are all relabeled, UGs do this, all affected specimens are relocated not just those being digitized.

 

 

 

 

 

 

 

 

 

 

 

During the relocation process, if unit tray needs expansiona new box is put in the drawer for those specimens, later as specimens are bar coded UGs move densely packed  specimens into new empty unit trays, repeat for entire section, drawers are labeled and initialed by students to track who did what

 

 

 

 

 

 

 

 

 

 

 

series are sorted in unit trays by collecting event and then by host plant by UGs  all the specimens that look identical are put together in a ‘duplicate’ series, all barcoded, then put back into the unit tray or expansion tray if needed.within a single unit tray all barcode numbers are sequential as it makes data entry in excel i.e. adding sequential numbers to the spreadsheet can use the excel autoincrement drag and drop function

 

 

 

 

 

 

 

 

 

 

 

drawer numbers are used for  tracking as folder names, images go into the filespace folder named for that drawer number, when all images have been attached to the collection objects in the database the temporary folder is deleted.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Imaging

Specimen Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Post Image Capture Image

Processing Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Capture Specimen Data from Image (Or Specimen Label) Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Post Specimen Data Capture Quality Analysis / Quality Control Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Georeferencing Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Comments:

Sample Pre Digitization Curation Tasks:

Specimen Accession, Specimen Cataloging, Interview Staff, Hire Staff, Train Staff, Decide What to Digitize, Pull Specimens, Sort Specimens (e.g., by Taxon, Sex, Geographic Region, Collecting Event, Collector, Color, Size, Shape), Add Taxon Names to Database, Update Taxonomic Identification on Specimens (e.g., vet type specimens)

Sample Imaging Tasks:

Affix Barcode, Turn on Camera, Check Camera Settings, Check Lighting, Order Specimens, Take Photos, Stamp Specimen as “Imaged”, Return Specimen to Collection

 

Sample Post Image Capture Image Processing Tasks:

Name images, Rename Images, Store Original, Crop, Make Derivatives, Color Correction

Sample Capture Specimen Data from Image (Or Specimen Label) Tasks:

Turn on Computer, Log In (Remote or on Site), Open Image, Enter Taxon Data, Enter Locality Data, Enter Specimen Record (All Data), Enter Only Minimal Fields, Built in Quality Control Steps In Situ

Sample Post Specimen Data Capture Quality Analysis / Quality Control Tasks:

Turn on Computer, Log In (Remote or on Site), Automated QA/QC – Taxon Names; Collector Names; Place Names; County-State Validation

Sample Georeferencing Tasks:

Turn on Computer, Log In (Remote or on Site), One Record At A Time, Batch Georef Processing

Breakout Group C - Three-dimensional specimens in boxes/drawers & specimens in spirits in jars

http://idigbio.adobeconnect.com/droid3/

Members of Group C: Linda Ford, Dean Pentcheff, Talia Karim, Louis Zachos, Andy Bentley, Laurie Taylor  (Moderators: Gil Nelson, Amanda Neill, Laurie Taylor)

~ 20 minutes: Current Workflow Constraints

Current workflow constraints and proposed solutions (process, technology, staffing, funding, institutional culture/psychology, more...)

~ 10 minutes: Workflow Processes Across Institutions

Elements that would cause a workflow to diverge from one institution to the next (volunteerism, level of professional expertise within the digitization process, funds, more...)

~ 30 minutes: Metrics & How to Measure Success

What defines a “successfully digitized object” (the outcome of an optimal workflow, including databasing, geo-referencing, etc) and measurements of success (cost per specimen, throughput per hour, minimization of level of knowledge required to fully digitize an object via process and tools, queue times, more...)

~ 45 minutes: Workflow evaluation matrix

As the biodiversity collections community moves forward with digitization efforts, we need strategies not only for documenting workflows, but also systematic methods for evaluating workflows to look for ways to increase efficiency. Some synonyms for efficiency include: effectiveness, efficaciousness, productiveness. While speeding up and automating processes certainly improves efficiency, there are other related factors to consider that, if optimized, can minimize damage to specimens, influence data quality, and increase worker satisfaction.

With this in mind, please consider the matrices below as a starting point to develop a methodical way to try and find various points in our workflows where productiveness might be increased. We look forward to your input on these forms and tweaking them to add value.

Look for opportunities to increase workflow efficiency in a systematic manner. How might one increase efficiency?

 

It is our plan to utilize the data captured in these forms to compile lists of needs for the community in each area (e.g. software development, sharing existing physical tools, a list of steps that can be done with citizen scientists, ...)

At the end of this there is a sample set with comments to show how these documents may help the community coalesce these ideas.

Pre-Digitization Curation

Tasks

must be done before digitization

could be done at or after digitization

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 Access to label data  from container - removing specimens from containers

 

 

 

 

 

 

 

 

 

 

 Investigate & document hazardous materials issues associated with retrieval

 x

 

 

 

 

 

 

 

 

 

 

 Place specimens in wet box

X

 

 

 

 

 

 

 

 

 

 

 Add color and scale bars

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Imaging

Specimen Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

  Specimen cleaning & prep

 

 

 

 

 

 

 

 

 

 

 Mounting for photo orientation

X

 

 

 

 

 

 

 

 

 

 

Image stacking

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Capture Specimen Data from Image (Or Specimen Label) Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Post Specimen Data Capture Quality Analysis / Quality Control Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Georeferencing Tasks

must be done before imaging step

could be done at or after imaging

could be done by local volunteers or students or non-PI staff

could be done remotely (aka crowd sourcing)

represents a step that could be automated

a task that would benefit from QA / QC

could be done with current existing machinery (e.g. Kirtas)

could benefit from authority file creation or sharing (if exists)

a physical tool exists to speed up or otherwise make task more efficient

can easily compute time / costs for this task

formulas exist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Comments:

Sample Pre Digitization Curation Tasks:

Specimen Accession, Specimen Cataloging, Interview Staff, Hire Staff, Train Staff, Decide What to Digitize, Pull Specimens, Sort Specimens (e.g., by Taxon, Sex, Geographic Region, Collecting Event, Collector, Color, Size, Shape), Add Taxon Names to Database, Update Taxonomic Identification on Specimens (e.g., vet type specimens)

Sample Imaging Tasks:

Affix Barcode, Turn on Camera, Check Camera Settings, Check Lighting, Order Specimens, Take Photos, Stamp Specimen as “Imaged”, Return Specimen to Collection

 

Sample Post Image Capture Image Processing Tasks:

Name images, Rename Images, Store Original, Crop, Make Derivatives, Color Correction

Sample Capture Specimen Data from Image (Or Specimen Label) Tasks:

Turn on Computer, Log In (Remote or on Site), Open Image, Enter Taxon Data, Enter Locality Data, Enter Specimen Record (All Data), Enter Only Minimal Fields, Built in Quality Control Steps In Situ

Sample Post Specimen Data Capture Quality Analysis / Quality Control Tasks:

Turn on Computer, Log In (Remote or on Site), Automated QA/QC – Taxon Names; Collector Names; Place Names; County-State Validation

Sample Georeferencing Tasks:

Turn on Computer, Log In (Remote or on Site), One Record At A Time, Batch Georef Processing

Breakout Group reports to the re-assembled Plenary Group (Notes)

Pre-Workshop Survey results and discussion (Notes)

Self-assessment

16 respondents

Often protocols for imaging, databasing, workflow in place

Rarely protocols for hardware, software, training staff, or georeferencing

Most respondents reported doing some kind of image manipulation, most are saving images as JPEGs.

Most work being done by students or paid staff

Only two crowdsourcing projects (one more like citizen science than true crowdsourcing)

Damage to specimens does occur, stats are not kept, damage is usually repaired immediately

79% reported benefits to digitization

Training session: Business Process Modeling (Notes)-- Jason Grabon

Workflows should not be static-- they become less efficient over time

Workflows should be continually improved and should have redundancy built in

Work breakdown structure (task list) + dependencies for each task are most important for us today

With extra time, add human/physical resources and time lags

The only way to definitively tell which workflow is most efficient is to actually use them and time them.

Suggestion of log sheets attached to cabinet or drawer with steps listed, having staff write date/time for each step.

Suggestion of development of modular processes that each institution can pick and choose from and use easily.

A training document should have a checklist first, then the written details.

Day 2 Breakout Group Documentation - Workflows

www.idigbio.org/sites/default/files/sites/default/files/Business%20Process%20Management.pptx 

www.idigbio.org/sites/default/files/videos/slides/Nelson_DROID.pptx

Breakout Group A - Specimens on flat sheets/in packets

IDENTIFY TOOLS THAT CAN HELP WITH THESE TASKS

Module 1: Project Management

Task ID

Task Name

Dependency(ies)

Resource(s)

T1

Define scope of project and goals

 

 

T2

Evaluate, select, purchase equipment and software

 

 

T3

Coordinate grant-funded projects

 

 

T4

Hire staff

 

 

T5

Define practical scope

 

 

T6

Identify IT requirements

 

 

T7

Purchase/obtain IT services

 

 

T8

Setup project meetings

 

 

T9

Feedback

 

 

T10

Training Staff

 

 

T11

Define Schedules/Timeline

 

 

T12

Create documentation

 

 

T13

Create/identify authority files

 

 

T14

Budget management/accounting reporting

 

 

T15

Reporting

 

 

T16

Integration with other activities

 

 

T17

Sustainability plan

 

 

T18

Install Equipment

Module 2: Pre-Digitization Curation

Task ID

Task Name

Dependency(ies)

Resource(s)

T1

Identify specimens to be digitized

 

 

T2

Identify location of specimen

 

 

T3

Remove specimen from collection and bring to imaging station

 

 

T4

Document/flag location to enable return of the specimen

 

 

T5

Apply barcode

 

 

T6

Specimen conservation

 

 

T7

Select specimens with key features for close-up images

 

 

T8

Publication

 

 

T9

Quality Control/QA

 

 

T10

Archiving

 

 

T11

Create Skeletal Record

** This may need to be a module- multiple places where this can be executed

 

T12

Optional: Validate taxonomy

 

 

Module 3: Imaging

Task ID

Task Name

Dependency(ies)

Resource(s)

T1

Start stable light source and allow it to reach running temperature (or check flash operation)

 

 

T2

Calibrate Camera to balance exposure and white balance based upon color chart

 

 

T3

Add Metadata (copyright, photographer, type of photo, …)

 

 

T4

Apply Color Adjustment (controversial)

 

 

T5

Place scale and color bar in the imaging frame

 

 

T6

Redact locality information for sensitive specimens

 

 

T7

Frame the Specimen

 

 

T8

Image the Complete Specimen (Herbarium Sheet)

 

 

T9

Image the Label

 

 

T10

Image the ancillary/archival material (ledgers, field notes)

 

 

T11

Optional: Close-up imaging (image the barcode)

 

 

T12

Light Specimen

 

 

T13

Scan barcode (in order to rename the file)

 

 

T14

Rename File

 

 

T15

Publication of Image to a public or private location

 

 

T16

Archive and create derivatives (web presentation file, OCR file)

 

 

T17

Quality Control/Quality Assurance

 

 

T18

Stamp to indicate the specimen has been imaged

 

 

T19

Return Specimen to the Collection

 

 

Module 4: Data Enrichment

Task ID

Task Name

Dependency(ies)

Resource(s)

T1

Georeferencing

 

 

T1a

  Ingest locality data set into the Georeferencing tool

 

 

T1b

  Attempt automated Geoferencing

 

 

T1c

  Validate Georeferencing results by reviewing map results

 

 

T1d

  Adjust points (manual keying or crowdsourcing)

 

 

T1e

  Add error radius/shape file to define precision

 

 

T2

Optical Character Recognition (OCR)

 

 

T2a

  Ingest label images into the OCR tool

 

 

T2b

  Delineate regions of interest with text (Apiary) and identify text classification

 

 

T2c

  Attempt OCR on the Label

 

 

T2d

  Archive raw text

 

 

T2e

  Validate OCR Results

 

 

T2f

  Correct OCR Errors (manual keystroking or crowdsourcing)

 

 

T3

Natural Language Processing (NLP)

 

 

T3a

  Ingest data into the NLP tool (typically OCR’d, but possibly typed into a document)

 

 

T3b

  Train/setup/configure grammars and parsing (predefined formats and cases, e.g. dates, duplicates)

 

 

T3c

  Attempt automated NLP

 

 

T3d

  Validate parsed NLP results

 

 

T3e

  Correct parsed NLP results (manual keystroking or crowdsourcing)

 

 

T4

Publication of enriched data

 

 

T5

Archiving the enriched data

 

 

T6

Quality Control/Quality Assurance

 

 

T7

Transcription

 

 

T8

Access Queued Images Requiring Data Capture

 

 

T9

Database utilizing speech recognition

 

 

T9a

  Train the software

 

 

T9b

  View the label

 

 

T9c

  Read the label

 

 

T9d

  Record data into the database

 

 

T9e

  Validate results

 

 

T9f

  Manually correct errors

 

 

T10

Manual Data Entry (Keystroking) - Internal Project Team

 

 

T11

Manual Data Entry (Keystroking) - Crowdsourcing

 

 

T12

Validate Country, State and County against authority files

 

 

T13

Programmatically validate lat/long coordinates

 

 

T14

Validate Taxonomy against authority files

 

 

** A common QR tool would be extremely helpful for the community

Breakout Group B - Specimens pinned in trays

http://tinyurl.com/cha6kto

Breakout Group C - Three-dimensional specimens in boxes/drawers & specimens in spirits in jars

Ledgers/card catalogs (materials not directly associated with specimens)

Tasks

Dependencies

Resources

T1

Select and Retrieve object

Human

T2

Transport to staging area

Human, cart, vehicle

T3

Locate page(s)

Human

T4

Image page

Human, camera/scanner

T5

Name file

Human

T6

Store file

Hardware, software

T7

Populate core metadata (process/admin/technical)

Human

T8

QC images

Human

T9

Re-store object

Human, cart, vehicle

T10

Create verbatim data from image file (OCR, etc.)

Human, technology

T11

Clean/verify data

Human

T12

Create interpreted data

Human

T13

Clean and verify data

Human

T14

QC data and correct if necessary

Human

T15

Archive

Human, hardware

T16

Augment data if necessary/desired (taxonomy, georeferencing)

Human, technology

T17

Archive

Human, hardware

Labels associated with specimens

Tasks

Dependencies

Resources

T1

Select and Retrieve specimens/lot/container

T2

Find specimens in lot/container

 

T3

Transport to staging area

T4

If needed extract label(s) (out of vials or jars etc.)

T5

Record/mark label(s) and associated specimen(s) (so association is not lost; can associate color placed near label with color placed near jar)

T6

If necessary transport to imaging station (may be multiple or different - camera/scanner)

T7

Prepare label(s) for imaging (flatten, dry)

T8

Image label(s)

T9

Populate core metadata (process/admin/technical)

T10

QC image(s)

T11

Name file(s) and associate them

T12

Store file(s)

T13

Reassociate label(s) and specimen(s)

T14

Re-store specimen(s)

T15

Create verbatim data from file (OCR, etc.)

T16

Clean/verify data

T17

Create interpreted data

T18

Clean and verify data

T19

QC data and correct if necessary

T20

Archive

T21

Augment data if necessary/desired

T22

Archive

Specimens

Tasks

Dependencies

Resources

T1

Select and Retrieve specimens/lot/container

T2

Find specimens in lot/container

T3

Transport to staging area

T4

Order specimens for optimal imaging efficiency (i.e., to prevent frequent lens changes)

T5

Record/mark label(s) and associated specimen(s) (so association is not lost)

T6

If necessary transport to imaging station

T7

Select appropriate imaging equipment/materials

T8

Follow imaging policy

T9

Set up camera/imaging station (may need to be set up each time and disassembled for security reasons etc.)

T10

Set up image naming convention

T11

Extract and position specimen

T12

Pre-imaging specimen prep (blackening/place under liquid/shot of air)

T13

Adjust hardware and software (focus, etc.)

T14

Image specimen(s)

T15

Potential multiple images (stacking or multiple views)

T16

QC images while being shot (focus, unwanted items in frame, color and saturation balance)

T17

Retake images if necessary

T18

Stack images if necessary

T19

Archive (temporary or permanent)

T20

Batch image processing (batch editing - crop, resize, saturation, color balance, white balance, scale bar)

T21

Archive (temporary or permanent)

T22

Human image processing

T23

Create derivatives (jpgs for web; attach to db record; thumbnail catalog)

T24

Populate core metadata (process/admin/technical)

T25

Name files and associate them

T26

Store file(s)

T27

Reassociate label(s) and specimen(s)

T28

Clean specimen if necessary (after any treatments above - blackening etc.)

T29

Re-store specimens

T30

Create verbatim data from file (OCR, etc.)

T31

Clean/verify data

T32

Create interpreted data

T33

Clean and verify data

T34

QC data and correct if necessary

T35

Archive

T36

Augment data if necessary/desired

T37

Archive

Plenary: reports back from the Breakout Groups and discussion (Notes)

Group C: Consider workflow augmentations for stratigraphic specimens that may need to include research steps.

Non-Destructive imaging is a requirement for scientific publication. Should be considered in workflow design/explanation.

Vision for the Future / Minority Reports / Out-of-the-box ideas (Notes)

Paul - robotics and engineering - not included in the workshop

Amanda -- show these to robotics now (so that they have workflows to look at)

Christopher -- realistic -- based on feedback from robotics real capabilities (can’t handled curled sheets, or “fuzzy issues.” If all specimens were exactly the same, with no variability -- works. But quirks of each process making).

Andrea -- Data Management missing from the discussion

Andy -- barcodes not needed (just catalog number)

Paul M. -- why are we imaging?

        Talia -- don’t need - on fast typist can enter from ledger (don’t need image)

        Andy -- but image means more people can database at  one time.

        Les Landrum - image b/c you may have a fire / explosion

        Laurie - traditional materials (ledgers) are dark / gray lit.

        Ed -- pull specimen to check label data or have added value as a way to -- look at verbatim                mage of the label instead as a way to check the veracity

Austin -- do you print out a copy of the database (OCR Font), in case of disaster (sun solar flares)

        Andy - yes, ledger on legal paper copy

        Austin -- 10 reams of paper -- to do 10 specimens per sheet

        Linda -- any change in time -- not captured, space constraints

                decided electronic redundancies are needed / better

        Dean - often people don’t back up, or only have 3 week type back up

        Jason B. -- we’ll soon outpace our ability to back up all the data we are creating?

                        -- what about an appliance to do this for the community?

        Andy -- something NSF could invest in infrastructure in the community across the country

        Amanda -- 100s of servers distributed across the country?

        Andy -- people have space problems already?

if NSF funded nodes -- for reciprocal, distributed data back up - tool need for NIBA Community Implementation Plan

Jim - What happens when ADBC ends in 8 years? What is the sustainability plan? How to we keep momentum?

        Louis Zachos: demand

        Ed Gilbert: enable people, tools to be able to digitize on their own

Andy: data that is digitized -- is being used -- metrics to show that data is being utilized. show that it’s useful

Andy: make sure people cite every source, every time

Jennifer: image copyright

Les Landrum: model for sustainability

Amanda: (national foundation for collections?)

Model where Users pay for data (some small amount)

Laurie: Library institutional support? What use is data to community?

Ed: Opportunity for Education / Outreach applications to show / demo usefulness

        user can create a species list on map

Plenary wrap-up discussion. DROID Working Group strategy for polishing and and dissemination of workshop products. (Notes)

iDigBio Working Groups

        Gil: Working Groups by domain

                Herbarium Working Group

See list of working groups on the idigbio website: https://www.idigbio.org/wiki/index.php/IDigBio_Working_Groups

Word Bank for Common Terms

(Some suggested primary task clusters are given below)

Primary Task

Sub-Task (May be Blank)

Community Term

Specimen Imaging

Rename the Specimen Image File

Rename Specimen Image File

Label Imaging

Capture Label Image

Pre-Digitization Curation

Stage

Pre-Digitization Curation

decide what to digitize

prioritize

Pre-Digitization Curation

vet taxon names applied to specimens

check taxonomy

Pre-Digitization Curation

count specimens

Pre-Digitization Curation

sort specimens (by some trait: size, color, sex, collecting event, ...)

Sort

Pre-Digitization Curation

label specimens (with pen or paint)

Pre-Digitization Curation

barcode specimen

apply specimen guid

Pre-Digitization Curation

Image Processing

Process Image

Image/Data Storage

GeoReferencing

GeoReference

Proofreading

Quality Control

Quality Assurance

Parking Lot - Future Action Items and Notes That Do Not Fit Elsewhere

Workshop Agenda

DROID:  Developing Robust Object-to-Image-to-Data Workflows

A Workshop on the Digitization of Biological Collections

30th - 31st May 2012

The DROID workshop is organized by Integrated Digitized Biocollections (iDigBio), a National Resource Center at the University of Florida and Florida State University, in collaboration with the Botanical Research Institute of Texas, Yale University, and the University of Kansas. The workshop is supported by the U.S. National Science Foundation’s Office of Cyberinfrastructure and Directorate for Biological Sciences, through the Scientific Software Innovation Institutes (S2I2) and Advancing Digitization of Biological Collections (ADBC) Programs.

Overview:

Biological specimens document the historical and modern occurrence of plant and animal species--and most of what we know about the diversity and distribution of life on earth. This research workshop addresses the design, documentation, and optimization of Object-to-Image-to-Data workflows for digitizing biological specimens which are curated in thousands of museum and herbarium collections worldwide.

Documenting digitization workflows begins with the recognition of differences that exist between specimen preparation types due to their physical properties and discipline-specific handling, collecting and preservation methods,curatorial and conservation practice, storage environments, data conceptualizations, and data label techniques. Digitizing data recorded on tags tied to vertebrate skins, on labels encircling snakes submerged in solutions of alcohol, on the lilliputian labels of pinned insects, and on the large, verbose labels glued on flat sheets of plant specimens, presents specific constraints and opportunities in each case for efficient digitization workflow design.

Goals of the Workshop:

  1. To illustrate and analyze a diversity of existing biological specimen digitization workflows with the aim of gaining a deeper and broader understanding of the practical logistics and efficiencies involved in the handling of biological specimens for the purpose of creating digital database records for publication and for new research applications of the biological, geospatial, and temporal information associated with specimens.
  2. To discuss and dissect the dimensions of: digitization project goal definition, the choice of project outcomes and metrics for their assessment, curatorial practice and technology application, human resource and training issues, social and professional values, and the promised deliverables which impact digitization project definition, processes, and success.  
  3. To engage in the application of lightweight business process modeling (BPM) to create and document reference workflow models for representative disciplines or specimen preservation types with the aim of enabling biological collection curators to implement  efficient data capture workflow through comparative analysis and quantitative evaluation of individual workflow steps and tasks.

Workshop Objectives:

  1. To review and examine a resplendent set of existing participant collections workflows as case studies, observing constraints, local optimizations, and creative solutions.
  2. To gain exposure to workflow design and implementation techniques from libraries and business.
  3. To consider how existing or proposed workflows could be enhanced or extended to gain cost efficiency, scalability, and generality (for implementing across additional collections).
  4. To identify critical constraints to digitization by discipline or preservation type which represent serious throughput bottlenecks and which may require out-of-the-box solutions and/or redefining digitization project goals or outputs.
  5. To identify opportunities for existing or new technology to address costly labor-intensive steps or processing gaps.
  6. To examine workflow goals, scope, and procedures for efficiencies of cost, staff utilization, technology, and outputs, and to propose general guidelines for evaluating workflow designs and workflow project success.
  7. To identify the synergies of collaborative digitization within TCN workgroups or across innumerable collections within a discipline.
  8. To propose near-term project design research and technology development priorities for accelerating the rate of specimen digitization and data publishing.

Desired Outcomes:

  1. Formation of a working group to collate work done at this workshop and advance the desired outcomes listed here.
  2. To identify best ways, based on existing human resources and technologies, for implementing scalable, efficient solutions for image capture and the integration of label images into data authoring workflows.
  3. To document methods for evaluating and quantifying the efficiency of workflow components and tasks, and their suitability/relevance/necessity to the core digitization project goals.
  4. To contribute to an annotated web resource illustrating common and divergent digitization tasks, issues, and constraints across disciplines/preparation types.
  5. To issue a call to action to identify resources, social and technical approaches, and hardware and software tools to bridge gaps in existing workflow end-to-end integrity.
  6. To produce a publication of Workshop findings in Collection Forum, PLoS ONE, and/or appropriate society/discipline journals.

Schedule:

Day 1, Wednesday, 30 May 2012

Time

Activity

Owner(s)

9:30AM

Welcome, overview, and brief participant introductions

Jason Grabon

Amanda Neill

9:45 AM

Workshop goals and agenda run-through

Chris Norris Jim Beach

Deb Paul

10:00 AM

Lightning Round of workflow summaries

5 minutes and 1 slide per presenter (~18 presenters)

Participants

10:30 AM

Coffee break

Pascal’s

11:00 AM

Continuation of Lightning Round

Group discussion

Breakout group definition and self-assignment

Participants

12:30 PM

Box lunch

1:15 PM

Training session: Workflow Core Concepts (level-set practices, processes, and developing a common terminology)

Q&A Session

Laurie Taylor Mark Sullivan

2:00 PM

Presentation: Workflow Elements and Concepts - Common Practices

Gil Nelson

3:00 PM

Coffee break

Pascal’s

3:30 PM

Presentation: Social Issues in Collaborative Digitization

Deb Paul

4:00 PM

Breakout Groups: small groups self-assigned by disciplinary interest to identify and record commonalities and divergences in:

  • Current workflow constraints
  • Workflow processes across institutions
  • Metrics - how to measure success
  • Workflow evaluation matrix

Breakout Groups & Moderators

5:45 PM

Review of evening activity and Day 2 agenda

Amanda Neill

6:00 PM

Group photo, dinner, and team building activities

7:00 PM

Dinner at Leonardo’s 706

Day 2, Thursday, 31 May 2012

Time

Activity

Owner(s)

9:30 AM

Review of Day 1, Day 2 agenda summary

Amanda Neill

9:45 AM

Breakout Group reports to the re-assembled Plenary Group

Breakout Groups

10:30 AM

Coffee break

Pascal’s

11:00 AM

Pre-Workshop Survey results and discussion

Shari Ellis

11:30 AM

Training session: Business Process Modeling

Brian Anthony

12:30 PM

Breakout Groups reconvene for box lunch and generate one or more redesigned workflows by addressing:

  • What would you change now?
  • Is a consensus workflow possible for your group?
  • Is a consensus workflow possible for a single preparation type? For a taxon?
  • What would you do to optimize these now?

Breakout Groups & Moderators

3:00 PM

Coffee break

Pascal’s

3:30 PM

Plenary: reports back from the Breakout Groups and discussion

Participants

4:30 PM

Vision for the Future. Minority Reports. Out-of-the-box ideas.

Jim Beach

5:00 PM

Plenary wrap-up discussion. DROID Working Group strategy for polishing and and dissemination of workshop products.

Amanda Neill

Deb Paul

Gil Nelson

5:30 PM

Adjourn

Software Tools

Software Name

Functionality Delivered

Who is Currently Using

ZBar - http://zbar.sourceforge.net/

1 and 2D barcode reading

BRIT

OCRopus - ocropus.org

OCR, image segmentation

BRIT

GOCR - http://jocr.sourceforge.net/

OCR and 1D barcode reading

BRIT

OpenLayers - http://openlayers.org/

Large Image navigation and zooming. Image segmentation interface.

BRIT

djatoka - http://sourceforge.net/projects/djatoka/

Image server, dynamic tiling of large JPEG2000 images

BRIT

http://jesserosten.com/2010/wireless-tethering-to-ipad

overview of wifi camera tethering

PLH