1 of 1

Enhancing Biodiversity Database Curation through Automated Extraction of Information from Unstructured Data

Roselyn S. Gabud

Solution

We developed information extraction methodologies to automatically extract information relevant to the distribution, reproduction, and habitat of species, that will aid human curators in updating biodiversity databases.

Problem Statement

Since much of our knowledge and investments about the natural world still remain in and are disseminated through scientific literature, manual creation and updating of biodiversity databases is becoming increasingly laborious.

Features

  • Named Entity Recognition (NER) models for extracting biodiversity-related named entities, e.g., taxonomic name, geographical location, temporal expression.
  • Relation Extraction (RE) models for extracting information on species' time-sensitive reproductive conditions and location-specific habitats.
  • Graph database to store the related entities, to allow querying and visualization – ready for analysis.

Textual Documents

Database

(e.g., graph database)

Unstructured Data

Structured Data

  • Contains bulk of knowledge on biodiversity
  • Thousands of new digital pages are published every month

Information Extraction Tools

human curator

DB

“The main observation site was conserved forest at Mariveles, Bataan.”

Habitat

Geographic Location

NER

RE

[occur in]

Research Interests: Text Mining, Natural Language Processing, Information Extraction, Biodiversity Informatics