Oral Presentation Abstracts - Digital Data Conference, Florida

	A	B	C	D	E	F	G	H
1	Last name	First name	Institution	Role	Title	Abstract	Co-authors	Themes

2	Agosti	Donat	Plazi	President	Material citations: a powerful aide to connect biodiversity data	The corpus of biodiversity literature represents the scientific output produced in biodiversity research. Each species has at least a taxonomic treatment, but in many several hundreds treatments exist. These are all interconnected and composing the entire catalogue of life. Each of these treatments is implicitly (sometimes explicitly) linked to specimens in natural history collections. Already in the advent of taxonomy Linnaeus kept a reference collection for his seminal works which he cited by listing its geographic origin in a regularly occurring section in the treatments. These material citations range today from a summary of specimen data (e.g. 20 paratypes from Columbia” ) to very detailed, normalized, augmented data including specimen, collection, and gene accession codes as well as collectors, date, collecting methods of host, etc. These material citations represent one of the best curated data about a specimen, full of links between different resources. This is a citation of a specimen and not a formal specimen record, which adds to the confusion on how to handle them. For this reason, we submitted “MaterialCitation” as a new class to TDWG’s Darwin Core standard, which is now open for public review. In this lecture we will explain the new term, its role in the biodiversity knowledge graph and to create digital accessible knowledge.	Marcus Guidoti, Plazi	Other (e.g.: Use of digitized data in education & outreach)
3	Bakış	Yasin	Tulane University Biodiversity Research Institute	Sn Manager of Biodiversity Informatics and Data Engineering	Evaluating the Image Quality of Digitized Biodiversity Collections’ Specimens	The expansion of digitization of biodiversity collections has gained a great significancy within the last few decades especially after the introduction of new methods in morphological analysis and the use of artificial intelligence technologies in species identification. However, this requires multimedia to be captured in a specific format and presented with some data descriptors. In this study, we explored two data resources each consist of 2D images of fish specimens from several institutions within Integrated Digitized Biocollections (iDigBio). Approximately 90 thousand images from iDigBio repository were harvested and analyzed for their suitability for image processing applications. Here, we introduce our experiences with the 2D images of biodiversity collections’ specimens and new metadata fields specific to the image quality.	Xiaojun Wang, Tulane University Biodiversity Research Institute; Henry L. Bart Jr., Tulane University Biodiversity Research Institute	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
4	Bates	John	The Field Museum	Curator	The Biodviersity Collections Network: What is next for extended specimens	The Biodiversity Collections Network (BCoN), in collaboration with the American Institute of Biological Sciences (AIBS), the Natural Science Collections Alliance (NSC Alliance), and the Society for the Preservation of Natural History Collections (SPNHC) is developing plans to continue to engage community input and build consensus towards a broader and more inclusive Extended Specimen Network (ESN). The vision includes focus on continued collecting and digitization, advancing both physical and cyberinfrastructure, and educating and training our workforce and user communities. The goal of this presentation is to outline next steps in BCoN's activities for the community in these areas, while placing them in the context of the broad ongoing initiatives across the community.	Jyotsna Pandey, Natural Science Collecitons Alliance	Other (e.g.: Use of digitized data in education & outreach)
5	Benedict	Melissa	National Ecological Observatory Network	Field Ecologist	An Introduction to Digital Data with NEON	The National Ecological Observatory Network (NEON) collects thousands of samples for over 200 different types of data every year to achieve our mission of providing open-source data for the greater ecological community. NEON collects this standardized data at 81 sites across the country using a combination of sensors, remote sensing instruments, and field ecologists that collect data at each of these sites and maintain sensors. NEON’s observational sampling was designed to target several different themes of data including: diversity, abundance, pathogens and productivity. Much of this data is available via download from our data portal, while physical samples can be accessed from our biorepository. An increasing number of samples are being digitized and made available via photographs on our portal. NEON can be a valuable tool for researchers needing data to answer their research questions.		Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
6	Bentley	Andrew	Specify Collections Consortium, University of Kansas	Usability Lead	Extending Specify for a New Biological Collections Computing Paradigm	Enthusiasm for the Extended Specimen vision and the promise of a Digital Specimen Architecture are igniting a global transformation of biological collections computing. The change will democratize biodiversity informatics by introducing new modes of integration for species occurrence data that will have broad, far-reaching impacts. Among other opportunities, the emerging paradigm promises to reconcile the untraceable proliferation of museum specimen records that cascade as copies into aggregators, collaborative databases, research project caches, and into publications cited as IDs. For collections institutions to benefit from this new enterprise architecture, collections management systems must evolve from autonomous, catalog-focused databases into porous, network-embedded platforms of specimen, annotation, update, and usage information. We are extending Specify to address this challenge by engineering network interfaces and organically developing new integrations to position the platform for a new world of symmetrical data flows and value-added data services. Standard network interfaces (APIs) will also position Specify Collections Consortium for modular, collaborative, code development across its member institutions. In this presentation, we will highlight recent work with Specify's APIs as steps on a path to supporting "Extended Museums" as FAIR and full partners in a worldwide Digital Specimen Architecture.	Specify Collections Consortium Staff	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
7	Bird	Jessica	Smithsonian, NMNH Department of Entomology	staff	Digitization of Smithsonian’s AntLab Legacy Database Using Open Source Software	The U.S. National Insect Collection contains over 35 million specimens, and is the second largest in the world. Large efforts to digitize the collection have been made, but the majority remains undigitized. Many curators have large legacy datasets created over many years for research purposes. Different formats and data standards have been used, which do not easily lend themselves to bulk data uploads to NMNH’s Research and Collections Information System. Recently, through a collaboration with student volunteers with Develop for Good (https://www.developforgood.org), data clean-up tools were created using open-source software. These tools assist in the automation of large scale data standardization and manipulation. They were designed specifically to assist in the creation of specimen-level records using the Smithsonian’s AntLab legacy database, but in a way that could be used for other legacy datasets with slight modifications. The AntLab database contains over 25,000 objects collected in over 2,000 locations. It is the most comprehensive dataset for fungus-farming “attine” ants (subfamily Myrmicinae, tribe Attini, subtribe Attina) in the world. Having this specimen-level database available to researchers and the public has the potential to benefit a wide range of biologists because attine-ant agriculture has become a model system for the study of symbiosis, coevolution, and the evolutionary dynamics of conflict and cooperation.	Baadshah Verma, Develop for Good; Michelle Li, Develop for Good; Yian Wang, Develop for Good; Ralph Lee, Develop for Good; Varun Murphy, Develop for Good; Tahmina Ahmad, Develop for Good	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
8	Buschbom	Jutta	Statistical Genetics	Independent scientist, consultant	Counting collections in: towards quantifying collections’ contributions for national bio-economic accounting and the United Nations post-2020 biodiversity monitoring	In the current biodiversity crisis, achievement of the goals of the United Nations Convention on Biological Diversity (CBD) has become of pre-eminent importance. The introduction of the post-2020 monitoring framework transforms those goals into concrete, actionable information that promotes implementation and innovative resource mobilization, and thus can galvanize change. Natural history collections and the collections community contribute accessible and multilaterally shared high-quality, validated primary biodiversity data, scientific and technical expertise, as well as physical and digital infrastructures, tools, services and a wide range of non-monetary benefits, which are directly needed for the monitoring of many of the targets for 2030 and all goals of the 2050 vision. In their fundamental and indispensable roles for the conservation and sustainable use of “natural capital”, scientific collections themselves express and highlight a country’s conservation efforts and investments. Collection institutions, collections, their human resources and services offer opportunities for quantifications of monetary and non-monetary conservation-relevant contributions. These might be incorporated into national accounts within a system of environmental and economic accounting (SEEA). This talk explores potential measures and the status of existing economic work and concepts. Furthermore, it points out existing initiatives, and infrastructure and data resources, which would support (sub-)national to global accounting.	Alina Freire-Fierro, Universidad Regional Amazonica Ikiam; Elizabeth R. Ellwood, iDigBio, University of Florida; Usman Atique, Department of Bioscience and Biotechnology, Chungnam National University;Austin Mast, Department of Biological Science, Florida State University	Influencing Policy (e.g.: Climate change; Environment; Air and water)
9	Chesshire	Paige	Northern Arizona Univeristy & Biodiversity Outreach Network	Graduate Student	Completing a National Bee Inventory for the Conterminous United States	Documenting complete bee species distributions on continental scales is a fundamental step for predicting bee species-community responses to changing climate. Towards this end, we estimated the smallest geographic scale for the conterminous United States that could account for 90% of all described bee species in that area. We generated range maps for each of the ~3,500 US bee species to establish species occupancy for pixel sizes of 225km2 to 48,400km2. Completeness analyses were then performed using 1.9 million bee records. We then added community science data and projected occurrences for an additional ~6 million undigitized bee records that occur in US collections to determine the degree to which "completeness" increased. The addition of community science data and the 6 million specimen records increased completeness. Gaps for specific regions and taxa will precisely inform future inventories to complete an inventory of US bees. This framework can be further extended to predict average US bee data completeness at any given resolution if we were to incorporate all undigitized records. Our results provide targets for establishing monitoring programs. Finally, we also created standardized workflows and databases that can be used worldwide to increase bee data completeness on a global scale.	Erica Fischer, Department of Entomology, Michigan State University; Nick J Dowdy, Milwaukee Public Museum, Department of Zoology; Alice C Hughes, Centre for Integrative Conservation, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences; Michael C Orr, Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences; John S Ascher, Department of Biological Sciences, National University of Singapore; Terry Griswol, United States Department of Agriculture - Agriculture Research Service; Neil S Cobb, Biodiversity Outreach Network; Lindsie M McCabe, United States Department of Agriculture - Agriculture Research Service (USDA-ARS)	Grand Challenges and Expanded Uses of Digital Biodiversity Data (e.g.: Human/public health; Biotechnology; Infectious disease transmission and mitigation; Food, soil, water security; Bioinspired design)
10	Cook	Kimberly	Indiana University	Research Assistant	Taxonomic concept mapping among historical floras of Alaska: decision-making and digital implementation	Awareness of the problem of shifting taxonomic concepts for taxon names is increasingly widespread, but examples of taxonomic concept mapping (TCM) and methodologies are still few. Here we will describe a TCM workflow and the resultant dataset of relationships among taxonomic concepts found in key historical floras of Alaska (Hultén, Welsh, Cody, Flora of North America. and the Panarctic Flora). We mapped 13 genera, which contain 557 taxonomic concepts and 482 taxon concept relationships. Each relationship is recorded in a web application and database, and displayed as a graph. Although some cases were simple (i.e. concepts for a name were congruent throughout), there were complex cases that would be difficult to understand without graphical representation. This presentation outlines the decision process of assigning taxon concept relationships among taxonomic concepts inferred from descriptions in the floras. Decisions are based on descriptions of morphology, geography, and synonyms, but all three factors must be weighed against one another when assigning a relationship. The database structure and web application (see poster by Webb et al.) permits our workflow to be implemented by others, and the mappings to be integrated into specimen datasets.	Stefanie M. Ickert-Bond, University of Alaska Museum of the North; Campbell O. Webb University of Alaska Museum of the North	Other (e.g.: Use of digitized data in education & outreach)
11	E. Patrick	Rashleigh	Brown University Library	Staff	Broadening access to the digital herbarium through community- and user-centered design processes	Over the past decade, a rapidly increasing number of herbarium specimens have been made available online. While this has greatly expanded ease of access, the collections nevertheless often remain inaccessible behind search and discovery interfaces that assume specialized knowledge. The Herbarium UX (HerbUX) project is an IMLS-funded exploration of users of digital herbarium collections which aims to develop interface proposals based on user research. Over the course of 2020, we conducted a series of meetings with collections stakeholders, including undergraduate science educators, museum professionals, and herbarium staff, in order to further understand how they engage (and seek to engage) with digitized collections of plant specimens. With this input from the user community, several themes emerged, including the need for a broader, more interdisciplinary context to the specimens; easier, more engaging, and aesthetic search and discovery experiences; and the use of familiar metaphors to generate, arrange, and share subcollections. These findings will be of particular interest to those responsible for the design of collections interfaces, and to those who are interested in expanding the use of collections beyond specialists.	Tim Whitfeld, Bell Museum Herbarium (MIN) University of Minnesota; Rebecca Y. Kartzinel, Brown University Herbarium (BRU)	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
12	Flemming	Adania	University of Florida	Graduate student	Looking for marginalized students use of Natural History Collections data	Natural History museums (NHM’s) are composed of millions of specimens which provide an opportunity for scientists to understand the biodiversity of our planet. To find solutions for a myriad of ecological, evolutionary and behavioral problems research collections are used by scientists to ask big questions. Making NHM’s more accessible to a diverse array of audiences has therefore been a primary focus of NHMs in the 21st century. As a result of increased digitization efforts, many publications have argued that the unprecedented access to specimens and their accompanying data render NHMs more accessible to a diverse audience. In our study we were interested in understanding how undergraduate students from marginalized backgrounds were accessing and using NHMs and what role museum specimens play in this process of learning and career development for undergraduates. A comprehensive literature review was conducted and primary data using a 23-item web-based Qualtrics survey (N= 103 instructors) was collected and analyzed to answer our question. The items varied in response dimensions, ranging from close-ended responses to open-ended responses. The open-ended inquiries provided further nuance and insight to our understanding of how tertiary level students interacted with natural history museums using various in person and online platforms. In this presentation we will share preliminary findings from the literature review and survey.	Molly Phillips, University of Florida/FLMNH/iDigBio: Temi Alao University of Florida Sociology Department	Other (e.g.: Use of digitized data in education & outreach)
13	Giermakowski	Tom	University of New Mexico	staff	Data from museum specimens in conservation action: two case studies from New Mexico.	The availability and use of biodiversity occurrence data based on specimens has been increasing steadily in the last decade, particularly due to efforts by museums to digitize historical data associated with specimens. We present two use cases of specimens and their associated data used in conservation, from a herpetological collection and a herbarium, both housed at the Museum of Southwestern Biology at the University of New Mexico. In collaboration with Natural Heritage New Mexico, we reevaluated the distribution of the Arizona Toad in Arizona and New Mexico, based on analysis of specimens and historic and recent records of occurrence. We trained a Random Forests model to these occurrence records, using a suite of bioclimatic, hydrological, and land-cover covariates as predictors. Using the MaxKappa method of model thresholding, our model identified an area totaling 5,167,655 ha as suitable habitat, including multiple riverine reaches that have not yet been sampled. Using this model, we were able to identify areas for field surveys, as well as examine potential ecological boundaries between different species of toads in western New Mexico. Species distribution modeling has already been used by Natural Heritage New Mexico to guide survey efforts for rare plant species. In collaboration with the Bureau of Land Management, Natural Heritage New Mexico developed a species distribution model for Kuenzler’s Hedgehog Cactus. To validate the model, we randomly placed survey locations in both habitat and non-habitat areas, as identified by our model. Our survey efforts resulted in the identification of 35 new cactus observations falling within the modeled area. Examples of this type of research projects highlight the importance of museum specimen and data curation as well as effective collaborations. These examples also show how specimens and associated information provide effective uses of data housed in natural history collections and ultimately affect conservation actions.	John P. Leonard, University of New Mexico; Richard J. Norwood, University of New Mexico	Conservation (e.g.: Ecological and natural resource restoration; Environmental justice; Preservation of ecosystem services; Invasive species/agriculture; Traditional knowledge)
14	Gunter	Nicole	Cleveland Museum of Natural History	Faculty	Bringing digitized data to the dome.	Planetariums are effective tools for informal education providing immersive learning experiences that captivate audiences. Historically most planetarium programs explore the universe looking into space, however technological advances now promote high-resolution exploration of Earth. As part of a NSF CAREER grant, the Cleveland Museum of Natural History teamed up with planetarium software designers Evans & Sutherlands to create new tools to explore digitized data in the dome. Two new tools were developed for the planetarium software Digistar 6 & 7: a plug-in that allows users to dynamically explore digitized data through the iDigBio portal, and a utility that converts Darwin Core data downloaded from aggregators to KML format providing a user-friendly way to filter and color-code records for use in planetarium programming. The upgraded capacity to explore biodiversity data in Digistar provides endless, new programming opportunities that utilize the 1.6 billion specimens digitized worldwide bringing digitized data to a new audience. This innovative use of planetarium technology to explore publicly available and institutional specimen data may transform how museums with planetariums communicate biological sciences, and approach outreach and education.		Other (e.g.: Use of digitized data in education & outreach)
15	Hansen	Sara	Central Michigan University	Grad student	Biodiversity data management from field collection to integration	The need for effectively managed and curated biodiversity data is growing. Natural history collections, published occurrence data, survey or monitoring program data, and field data vary greatly in their structure and purpose. Successful integration of data from diverse sources can inform large-scale research questions and management actions, but there is often a disconnect between collectors of data in the field and end users of the data. Preparation for fieldwork should include considerations of the type and structure of data to be collected, so that their potential is maximized within the context of the research project and they can be seamlessly integrated with other data sources. Data collection methods must be plausible and sustainable in the field while retaining the structure and detail needed for integration and analysis. We provide a framework for the movement of data through the steps of collection, aggregation, and analysis. Once data are collected, they are manipulated to meet the needs of the research project and integration with other field data and preserved specimens. We will share how past projects inform our efforts to simplify and streamline data collection and management so their usefulness is maximized and the burden on the field collector minimized.	Blake Cahill, Central Michigan University; Rachel Hackett, Michigan Natural Features Inventory; Michael Monfils, Michigan Natural Features Inventory; Anna Monfils, Central Michigan University	Conservation (e.g.: Ecological and natural resource restoration; Environmental justice; Preservation of ecosystem services; Invasive species/agriculture; Traditional knowledge)
16	Hantak	Maggie	Florida Museum of Natural History	post-doc	Assessing organismal color pattern variation using museum collections, computer vision modeling, and web-based community science images	Color polymorphic organisms offer a unique system for studying intraspecific phenotypic responses to climate change as distinct phenotypes experience different selective pressures. Here, we focus on a use-case of a striped/unstriped color pattern polymorphism in the geographically widespread Eastern Red-backed Salamander (Plethodon cinereus). The ecological and evolutionary mechanisms influencing the geographic patterns of coloration in P. cinereus color morphs remains unclear, and no studies have examined range-wide patterns of the polymorphism. We developed two methods to obtain a high-volume of contemporary and historical color morph data. For contemporary data, we extracted images of P. cinereus from iNaturalist, created a training dataset, and developed an automated image analysis pipeline based on convolutional neural networks, which we then used to analyze the remaining images. For historical data, we created a pipeline to extract trait data from fluid-preserved museum specimens where we batch-photographed salamanders, de-aggregated individual specimens from photographs, and solicited the help of community scientists to score color morphs. With these datasets, we tested whether color morphs are associated with different climate predictors throughout the species’ range. Overall, this work highlights new practices of extracting high-volume trait data from community science images and museum specimens to test outstanding biological questions.	Robert Guralnick, Florida Museum of Natural History; Brian Stucky, University of Florida; David Blackburn, Florida Museum of Natural History	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
17	Hultgren	Kristin	Seattle University	Faculty	Quantifying Morphology Using Digital Images: A Tale of Two Symbionts	Here we explore the use of taxonomic illustrations and digital photographs for studying morphological variation in two different crustacean groups—the family Pinnotheridae (pea crabs) and the genus Alpheus (snapping shrimps). We extracted simple morphological measurements (body and claw aspect ratio) from ~1800 taxonomic illustrations and standardized digital photographs using image analysis software, and compiled host and habitat data from the literature, to examine the degree to which body and claw morphology correspond to host/habitat use. We also examined the effects of investigator and image type (photo vs. drawing) on estimates of host morphology, and variation within a species. In alpheid snapping shrimp, minor claw aspect ratio is strongly correlated with habitat use; burrow-dwelling species have minor claws with a larger aspect ratio, while crevice-dwelling species have minor claws with a smaller aspect ratio. In the pea crabs, body aspect ratio is strongly correlated with host use: species symbiotic with tube-dwelling hosts have a significantly higher body aspect ratio than species living with non tube-dwelling hosts. Taxonomic illustrations and digital photographs represent an underutilized and potentially valuable source of data for studying variation in morphology among different species.	Christine Foxx, Iowa State University	Other (e.g.: Use of digitized data in education & outreach)
18	Iwanycki Ahlstrand	Natalie	Natural History Museum Denmark	Postdoc	Herbarium phenology data and iNaturalist data outperform citizen science data in detecting response of flowering time to climate change	Innovative means have been developed to extend the temporal and spatial range of phenological data, obtaining data from herbarium specimens, citizen science programs, and biodiversity data repositories. These different data types have seldom been compared for their effectiveness in detecting environmental impacts on phenology. To address this, we compare three separate phenology datasets from Denmark: i) herbarium specimen data spanning 150 years, ii) citizen science data collected over a single year, and, iii) data derived from observations in iNaturalist over a single year. Each dataset includes flowering day of year observed for three common spring flowering plant species: Allium ursinum, Aesculus hippocastanum, and Sambucus nigra. Herbarium data had the strongest effect of spring temperature on flowering in Denmark, due to inter-annual variation. The iNaturalist dataset detected mildly significant effects whereas no significant effects of climate on flowering times were detected in the citizen science dataset, due to smaller sample size and limited geographic sampling. Citizen science data and iNaturalist observations will increase in value as they include more years, but for now, the iNaturalist observations appear to be more valuable in climate change studies and more easily combined with observations from herbarium specimens.	Richard B. Primack, Anders P. Tøttrup	Conservation (e.g.: Ecological and natural resource restoration; Environmental justice; Preservation of ecosystem services; Invasive species/agriculture; Traditional knowledge)
19	Jurburg	Stephanie	German Centre for Integrative Biodiversity Research (Halle-Jena-Leipzig)	Postdoc	The archives are half-empty: an assessment of the availability of microbial community sequencing data	As DNA sequencing has become more popular, the public genetic repositories where sequences are archived have experienced explosive growth. These repositories now hold invaluable collections of sequences, e.g., for microbial ecology, but whether these data are reusable has not been evaluated. We assessed the availability and state of 16S rRNA gene amplicon sequences archived in public genetic repositories (SRA, EBI, and DDJ). We screened 26,927 publications in 17 microbiology journals, identifying 2015 16S rRNA gene sequencing studies. Of these, 7.2% had not made their data public at the time of analysis. Among a subset of 635 studies sequencing the same gene region, 40.3% contained data which was not available or not reusable, and an additional 25.5% contained faults in data formatting or data labeling, creating obstacles for data reuse. Our study reveals gaps in data availability, identifies major contributors to data loss, and offers suggestions for improving data archiving practices.	Maximilian Konzack, iDiv; Nico Eisenhauer, iDiv; Anna Heintz-Buschart, iDiv	Genomic Data (e.g.: eDNA; Vouchers for genomic samples; Landscape genetics; Biodiversity genomics)
20	Kumar	Neha	University of Caliifornia, Berkeley	Graduate Student	Digitizing Insect Specimen Photographs with an OCR and Machine-Learning Enabled Information Extraction Pipeline	Understanding the reciprocal impacts of organisms and global environmental change requires harnessing species distribution data preserved in natural history collections. However, digitization of insect specimens lags behind other taxonomic groups due largely to the idiosyncrasies and volume of specimen labels to be processed. Our automated digitization pipeline uses Google’s CloudVision API for optical character recognition on photographed specimen labels. Deep learning-powered question-answering, regex, and lookups retrieved fields including taxonomy, collector, location, and date collected. Processing of ~400k specimens from the Essig Museum of Entomology via this pipeline was over 10000x faster than manual annotation. In a case study of 1000 randomly-selected leaf-cutter bee specimens (Hymenoptera: Megachilidae) from the United States, we extracted county and state information from 68.6% and 77.2%, respectively. Granular locality (eg: 2 miles east of Berkeley) was accurately extracted for 60% of records. All 1000 records were processed through GeoLocate to estimate geo-coordinates, extracting latitude and longitude for 64.5% of records for locality analyses: 39.9% high precision, and 24.6% low precision (only county or city centers). While difficulties exist especially for handwritten, highly-abbreviated labels, significant speed improvements and high accuracy on many specimen labels make deep learning approaches viable tools for digitization.	Suhas Gupta, University of California, Berkeley; Shweta Sen, University of California, Berkeley; Apik Zorian, University of California, Berkeley; Fred Nugen, School of Information, Division of Computing, Data Science, and Society, University of California, Berkeley, Berkeley, CA; Alberto Todeschini, School of Information, Division of Computing, Data Science, and Society, University of California, Berkeley, Berkeley, CA; Peter T Oboyski, Essig Museum of Entomology, University of California, Berkeley, Berkeley, CA	Artificial Intelligence (AI) (e.g.: Machine learning; Data mining/parsing; computer vision)
21	LeVan	Katherine	Battelle / National Ecological Observatory Network	staff	The broad reach of the NEON extended specimen	The National Ecological Observatory Network (NEON) is a continental-scale environmental monitoring platform designed to provide data and samples across spatial and temporal scales. Since 2013, NEON has archived more than 250,000 specimens - collections that include environmental samples, vouchers, and DNA extracts stored in conditions to enable transformative research. Once archived, all NEON collected samples are searchable through the NEON Biorepository portal (https://biorepo.neonscience.org) and are physically and digitally curated to facilitate sample discoverability and specimen reuse. The extended specimen concept within the NEON collections uses programmatic updating of specimen records using API calls and DarwinCore compliant ontologies. The NEON Biorepository uses a Symbiota platform for its specimen portal, and significant development on modules within this platform facilitate linking of related specimens (associated occurrences), global unique identifiers (IGSNs via SESAR), and (via ORCID links, where available). Where available, images, morphometrics, genetic sequences and taxonomy are associated with each specimen. NEON samples are available for additional analyses; publications and results of these analyses are linked to the specimen records on the portal. This talk will discuss how NEON’s approach to the extended specimen paradigm can facilitate biodiversity research, give examples of current projects, and provide a demonstration of portal search capability.		Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
22	Marcel	Kouete	University of Florida	Student	Bridging the gap between natural history specimens and their microbial communities	Historical amphibian collections are currently being used to document the spatial and temporal presence of the fungal pathogen Batrachochytrium dendrobatidis (Bd) that causes chytridiomycosis and is related to species declines and extinction. Similarly, epidemiology and microbiology have expanded our understanding of microbial diversity and the relationship between eukaryotes and their associated internal and external microbial communities. Unfortunately, we lack tools to characterize microbial diversity from fluid preserved specimens in the same way as screening for Bd. The best scenario for sampling microbial diversity remains to obtain data of individuals before preservation as specimens. To bridge this gap, we organized two collecting field trips in Cameroon (Central Africa) in 2018 and 2019 and obtained voucher specimens of frogs and caecilians with associated skin and gut microbial samples. Our materials were accessioned at the Florida Museum of Natural History and specimens are being made available to other researchers through biodiversity repositories. After molecular and bioinformatic analysis of our microbiome samples these data will be deposited in an appropriate public repository such as Qiita. This work allows us to add further value to the preserved frogs and caecilians specimens currently housed in museums and provides new data for their microbial communities and pathogens.	Molly C. Bletz, University of Massachusetts Boston; Brandon C. LaBumbard, University of Massachusetts Boston; Douglas C. Woodhams, University of Massachusetts Boston; David C. Blackburn, Florida Museum of Natural History	Genomic Data (e.g.: eDNA; Vouchers for genomic samples; Landscape genetics; Biodiversity genomics)
23	Mason	Noelle Mason	Colorado State University	undergraduate	Improving the Efficiency of DNA extractions for Avian Climate Change Research	Because birds are ubiquitous throughout the world, tracking avian population trends has become a critical indicator of biodiversity loss across species. Genomic tools are often used to assess climate vulnerability or understand adaptations in bird populations. These tools require a high quality DNA sample, often collected from blood or feathers. Blood collection is often limited in the field, so it is necessary to maximize DNA yield from avian blood samples. Manufacturer extraction kits offer a standardized approach to DNA extraction from blood, though modification of the protocol may increase DNA yield. I anticipate that modifying the recommended protocol to include the enzymatic enhancement buffer ATL and increasing the incubation time will yield more DNA extract. To test this, we captured Black-capped Chickadees and Red-breasted Nuthatches at bird feeder complexes around Fort Collins, Colorado and collected blood samples from each bird. In preliminary results, the addition of Buffer ATL coupled with increased incubation time nearly quadrupled the yield of DNA from blood samples. Improving the quantity of DNA extract will increase the accuracy of a wide variety of genomic studies aiming to understand the evolutionary responses of birds to climate pressures and increase avian conservation measures.	Ruegg, Kristen; Rodriguez, Marina; Schweizer, Teia	Genomic Data (e.g.: eDNA; Vouchers for genomic samples; Landscape genetics; Biodiversity genomics)
24	Mast	Austin	Florida State University	Faculty	Trailblazing Rapid Biodiversity Data Enhancement to Address Emergent Crises and the Case Study of Horseshoe Bats	Genomic evidence suggests that the causative virus of COVID-19 (SARS-CoV-2) originated in horseshoe bats (Family Rhinolophidae) and that species in this family, as well as in two closely related families, are reservoirs of several SARS-like coronaviruses. Specimens collected over the past 300 years and curated by the world’s natural history collections provide an essential reference as scientists work to understand the distributions, life histories, and evolutionary relationships of these bats and their viruses. We collaborated to quickly produce a deduplicated, standardized, vetted, and versioned data product of 89,837 specimens of the focal bats shared from 118 natural history collections through the iDigBio portal and Global Biodiversity Information Facility. The project serves as a model for future rapid data enhancements about biodiversity specimens, having generated protocols for georeferencing collection locations and standardizing collector names, scientific names, collection dates, and linkages to additional resources. We will introduce the protocols and code written to be used for this project and to be repurposed for new rapid data enhancement campaigns, including the new functionality created for GEOLocate’s Collaborative Georeferencing platform and BIOSPEX. The data product is shared at Zenodo: https://doi.org/10.5281/zenodo.3974999.	Deborah Paul, Florida State University and Illinois Natural History Survey; Nelson Rios, Yale University; Erica Krimmel, Florida State University; Robert Bruhn, Florida State University; Aja Sherman, Florida State University; Katelin Pearson, Florida State University; Trevor Dalton, Florida State University; David Shorthouse, Bionomia; Nancy Simmons, American Museum of Natural History; Pam Soltis, University of Florida; Nathan Upham, Arizona State University	Grand Challenges and Expanded Uses of Digital Biodiversity Data (e.g.: Human/public health; Biotechnology; Infectious disease transmission and mitigation; Food, soil, water security; Bioinspired design)
25	McLean	Bryan	University of North Carolina Greensboro	faculty	Individual-level trait-bases for small mammal phenology research	Impacts of global change on animal life histories are projected to be diverse across space, time, and traits. However, spatially and temporally dense life history trait datasets are still lacking for all but the most easily-observed of species. We took an informatics-based approach to reconstructing breeding phenology and its drivers in three clades of small mammals that are widespread across North America (Peromyscus mice, Microtus and Myodes voles, Sorex shrews). To do this, we combined reproductive trait observations from disparate digital biodiversity datasets (museum specimens and NEON field monitoring) and used them to a) reconstruct species-specific breeding phenologies in different ecoregions, and b) test the importance of environmental variables as breeding cues. Despite the high heterogeneity in these data, we reconstructed phenologies in these very common small mammals. We also demonstrate how phenologies and use of distinct environmental cues vary based on physiological and life history differences. Further, for a subset of species, we were able to link use of breeding cues to ecosystem-specific limiting conditions. Our results provide insight into small mammal reproduction in the wild and highlight the critical need for denser trait-bases for these secretive and hard-to-monitor taxa.	Robert Guralnick, Florida Museum of Natural History	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
26	Miller	Joe	GBIF	Executive Secretary	Identifying and clustering similar occurrences across collections using GBIF tools	GBIF has implemented an exploratory feature in the data portal to cluster occurrences using shared attributes such as similar collector, taxon, identifiers, dates, type status and locality. The algorithm searches across all datasets shared with GBIF such as specimens from natural history collections, but also datasets from Barcode of Life, Genbank and specimen references text-mined from literature. Often these sample and literature-based occurrences are not directly linked to the source specimen occurrence, but as they contain attributes used to form occurrence clusters (collector or specimen IDs, dates and locality data) they can be processed to unambiguously point to a particular specimen also shared with GBIF. A cluster can bring together many components of the Extended Digital Specimen: the specimen itself, data that is embedded with it such as images and field data, and derived DNA sequences and literature citations from occurrences within the cluster. The algorithm is particularly useful in clustering herbarium duplicates which brings together the varied curation efforts (georeferencing, identifications) conducted on separately managed specimens. We will describe the clustering methods using several examples, point to possible future development and ask for your input on how to proceed at GBIF. This talk will also describe potential cost savings due to deduplication of efforts.	Nicky Nicolson, RBG Kew; Tim Robertson, GBIF	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
27	Nordén	Klara	Department of Ecology and Evolutionary Biology, Princeton University	PhD student	Detecting iridescent feather nanostructures with polarization imaging	Iridescent plumage coloration produces shimmering, metallic colors and has attracted awe and fascination from scientists and artists alike for centuries. However, the evolution of this trait in birds has remained mysterious. A major obstacle is the lack of fast, non-destructive methods to quantify iridescence. Iridescence arises from nanostructural ordering within feather filaments, and current methods rely on imaging feather cross-sections under a transmission electron microscope (TEM). This method is expensive, time-consuming and requires feather samples to be plucked from specimens. Thus, it cannot support large, macroevolutionary studies of iridescence. Here, I present a novel non-destructive method to detect iridescent nanostructures in feathers which only requires imaging of specimens (museum bird skins). Iridescent structures function as tiny mirrors, which gives them unique optical properties compared to pigmentary or non-iridescent structural colors. I exploit this by measuring the difference in the degree of polarization of reflected light from different types of plumage. Iridescent plumage reflects more polarized light than other plumage types, which together with light intensity can be used to distinguish them. My results suggest that polarization imaging could be a powerful, non-destructive and inexpensive way to quantify iridescent nanostructures. Ultimately, this method could lead to new insights into the evolution of iridescence.	Mary Caswell Stoddard, Princeton University	Artificial Intelligence (AI) (e.g.: Machine learning; Data mining/parsing; computer vision)
28	Norton	Ben	North Carolina Museum of Natural Sciences	Staff	Standardized Value APIs: The Next Step in the Evolution of Biodiversity Data Sharing	Application Programming Interfaces (APIs) are the foundation for the modern web. They enable independent applications, systems, or databases to share information securely and seamlessly while maintaining functional independence. From that capability, web-based APIs enable the development of community-level digital workflows that can integrate biodiversity data processing across platforms to create a global ecosystem of value-added services easily accessible from within local data systems. Innovative efforts in web-based API development are central to addressing the many challenges in biodiversity informatics. One of the most profound facing the biodiversity data community is the issue of value standardization. The Darwin Core Standard provides a comprehensive, evolving stable, standardized terminology for data sharing. The next step is to couple the standardized terms with standardized values. We show how web-based Application Programming Interfaces, branded as Standardized Value APIs (SVAPIs), are critical to this next step in the evolution of biodiversity data sharing. Further, we discuss how SVAPIs have been integrated into existing data flows and their profound impact on data integrity and quality.		Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
29	Pearson	Katie	California Polytechnic State University	Project manager	New Tools to Score, View, and Download (Phenological) Trait Data in a Symbiota Portal	Biodiversity specimens contain a wealth of trait data, including size, shape, reproductive status, and evidence of interacting species. These data are part of the Extended Specimen Network concept. In aggregate, trait data from millions of digitized specimens allow researchers to unlock new discoveries and answer questions about adaptation, plasticity, and ecological interactions, among many others. To facilitate the creation and use of Extended Specimen data—specifically phenological data such as flowering or fruiting status—the California Phenology Network developed new tools in the CCH2 Symbiota portal (cch2.org). These tools allow a user to score traits of specimens using images or text fields, view trait data associated with specimens, conduct searches for specimens with certain traits, and download trait data associated with specimen data. These tools are developed for the Symbiota Light code platform and will therefore be available for dozens of portal communities that aspire to integrate trait data annotations and score domain-specific traits from specimens. In CCH2, these tools have enabled researchers and undergraduate students to study the effects of climate on phenological events in California plant species. We foresee a diversity of novel applications for these tools as the greater biocollections community refines and expands upon this groundwork.	Edward Gilbert, Arizona State University; Christopher Tyrrell, Milwaukee Public Museum; Nico Franz, Arizona State University; Jenn Yost, California Polytechnic State University	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
30	Sivakumar	Ashwin	Flintridge Preparatory School	student	Fossil-Augmented Species Distribution Models Recontextualize Introduced Turkey Ecology in California	Managing invasive species is a central challenge in conservation. New techniques using Quaternary fossil records and zooarchaeological assemblages reveal that many introduced species have unexpected histories and may represent ecological or taxonomic substitutes for extirpated or extinct species. The Wild turkey (Meleagris gallopavo) is a non-native, potentially invasive species in California. However, it is congeneric with the California turkey (Meleagris californica), an endemic species that went extinct at the end of the Pleistocene. To assess these two closely related species' potential ecological overlap and provide a currently unaccounted for baseline of turkey ecology in California, a species distribution model (SDM) for Meleagris californica was developed based on bioclimatic data and fossil localities from the Last Glacial Maximum employing an ensemble of two machine learning algorithms. This model was then projected into current landscapes using present-day climatic data as a counterfactual and then compared to an SDM for extant Meleagris gallopavo generated using present-day observations from citizen science databases. Qualitative maps and quantitative indices strongly suggest that Meleagris gallopavo in California today largely occupies geographic and environmental spaces similar to those used by Meleagris californica. Cross-validation using other techniques will be needed to confirm the species' role as an ecological substitute.	Alexis Mychajliw, Middlebury College	Conservation (e.g.: Ecological and natural resource restoration; Environmental justice; Preservation of ecosystem services; Invasive species/agriculture; Traditional knowledge)
31	Vardi	Reut	Ben-Gurion University	grad student	Digital human-nature interactions: trends of popularity and seasonal interest in plants	The digital revolution offers many opportunities to explore trends at vast scales. Conservation culturomics is the field dedicated to understanding human–nature interactions through the analysis of large online datasets. Nevertheless, culturomic data entails inherent biases that need to be acknowledged and addressed. Here, we analyzed people’s digital engagement with Israeli plants as these are manifested in Google search patterns and Wikipedia pageviews. We found that people interact with nature differently between these two sources. Overall, Google highlights more species that have utility to humans while in Wikipedia emblematic plants receive more attention. Furthermore, in Google, popular species gain more attention with time, opposite to the trend in Wikipedia. Our results suggest that people’s digital interactions with nature - and thus our ability to use this information for conservation - may be inherently different depending on the sources explored. Nonetheless, harnessing big data sources such as Wikipedia and Google, can serve as an important tool in future conservation efforts. Conservation culturomics can help us better understand human behaviour and interest, construct more efficient conservation initiatives and management plans, and monitor their success.	Uri Roll, Ben-Gurion University	Conservation (e.g.: Ecological and natural resource restoration; Environmental justice; Preservation of ecosystem services; Invasive species/agriculture; Traditional knowledge)
32	Villanueva	Luis J.	Smithsonian Institution	Informatics Program Officer	Digitization of Unstructured Text in Entomology Specimen Labels Using Machine Learning	This project tested if optical character recognition (OCR) tools that use machine learning (ML) systems can reduce the amount of records that have to be transcribed manually in a mass digitization project. We used the images from the digitization of the Entomology collection of bumblebees (genus Bombus) and carpenter bees (genus Xylocopa) to test automated methods of transcribing the labels. Since the labels for all the specimens were going to be transcribed using a transcription vendor, this presented a good dataset to compare automated methods to achieve the same results for less money and less time. We used a sample of manually transcribed records to look for similarities in the text from the OCR. Using this approach, we were able to populate 35-70% of the data in 4 fields of the records. The combination of ML tools and our approach can reduce the cost and time that needs to be invested in mass digitization projects. These tools can also help enhance existing records in the collections.		Artificial Intelligence (AI) (e.g.: Machine learning; Data mining/parsing; computer vision)
33	Welcome	Ashton	South African National Biodiversity Institute	Staff	Using herbarium specimen data to further understand the species of Pavonia Cav. in the FSA region	Pavonia, the only genus of the tribe Malvaviscae in the Flora of Southern Africa (FSA) region is revised and 12 species are recognised, four of which appear to be endemic to the region. A combination of leaf, epicalyx, calyx and mericarp characters can be used to distinguish between the species of Pavonia. Information about the distribution, ecology and phenology have been digitized from collection labels and can be accessed from the Botanical Database of Southern Africa (BODATSA). This herbarium specimen data, which can be used to further understand each species, will be presented alongside a key to the species.	Janine Victor, South African National Biodiversity Institute	Conservation (e.g.: Ecological and natural resource restoration; Environmental justice; Preservation of ecosystem services; Invasive species/agriculture; Traditional knowledge)
34	Whitfeld	Timothy	Bell Museum, University of Minnesota	Herbarium Collections Manager	Minnesota’s Biodiversity Atlas: an online portal for activating natural history data and facilitating collaboration	In 2016, the University of Minnesota’s Bell Museum opened the Minnesota Biodiversity Atlas, an online Symbiota portal. The Atlas provides public access to hundreds of thousands of plant and animal specimens from the Bell Museum collections, in-state university partners, the Leech Lake Band of Ojibwe, the Science Museum of Minnesota, and a trove of expert observation data from state and federal agencies. The Atlas is one of the only publicly accessible portals to integrate such a range of disparate collections and to incorporate specimens from across the tree of life into a single database. The Minnesota Biodiversity Atlas serves critical needs for on- and off-campus partners. Some examples include: digital records for field botany and ecology classes, guiding field surveys, providing images for identification in the field, and enabling conservation planning by partners including the Minnesota Department of Natural Resources and the Minnesota Pollution Control Agency. Concurrently, the Atlas is enhanced by data collected by these State Agencies including tens of thousands of specimens and hundreds of thousands of expert observations. This combination of data types provides an unusually complete record of plant and animal distribution in Minnesota.	George Weiblen, Bell Museum University of Minnesota	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
35	Zermoglio	Paula	VertNet	Research lead	BELS Global Gazetteer of Georeferences	Location information constitutes a key component that allows broad use of biodiversity data in research, conservation and decision making. Presently, ~60% of the specimen records shared through the GBIF network have latitude and longitude values. However, the vast majority of those records lack proper georeferences, i.e., at minimum, valid coordinates, datum and spatial uncertainty. The most common georeferencing method is a manual process, which has the benefit of potential high accuracy but requires training and is time consuming, particularly for some types of locations and historical records. Despite these constraints, millions of records have been georeferenced with high accuracy, providing a large corpus of georeferences that could be reused. We developed a Global Gazetteer of Georeferences, containing all distinct combinations of Darwin Core Location class terms from data aggregated in GBIF, iDigBio and VertNet. The gazetteer contains preprocessed determinations of simplified representations of all Locations and of the best existing spatial representation, if any, for every matching string, based on their compliance with georeferencing best practices. Through a set of scripts, API, and initial user interface users can submit Darwin Core Locations for checking against the BELS gazetteer, and retrieve results containing the best georeferences available for each record.	Julia Allen, University of Nevada, Reno; Raphael LaFrance, Florida Museum of Natural History; Robert Guralnick, Florida Museum of Natural History; John Wieczorek, VertNet	Enhancing Digital Records (e.g.: Digital specimens, Extended specimen concept)
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100