1 of 12

OntoChimp: A ChatGPT-Based Concept Extraction�Resource for Ontology Development

Samuel Smith

University of Michigan Medical School, Ann Arbor, MI, USA

2 of 12

Motivation and Challenge

• Developing a domain-specific ontology requires identifying a comprehensive, well-defined list of key concepts.

• Manual extraction is labor-intensive and inconsistent.

• OntoChimp provides a semi-automated, NLP-driven approach combining ChatGPT and ontology analysis tools.

3 of 12

Overview of the OntoChimp Process

Workflow 1 key concept extraction from a document (*) and structuring the data plus metadata.

Output: the OntoChimp TermTable, a structured dictionary for each key concept.

Workflow 2 utilizes BioPortal Annotator and other resources to categorize the key concepts.

(* term clusters are also extracted but not yet processed.)

4 of 12

Workflow 1: Key Concept Extraction

Inputs: 29 chapters of 'The Handbook of Solitude' (2021)

Process:

  • Each document is processed individually so Key Concepts are relative to a coherent document.
  • An Excel workbook is output containing all concepts for a given document
  • Excel workbooks for the reference document set are combined into the OntoChimp TermTable data structure

Output: 506 solitude-related key concepts.

5 of 12

Example: TermTable JSON Dictionary Output

The TermTable forms a bridge between Workflow 1 and Workflow 2

6 of 12

Overview of the OntoChimp Process

Workflow 1 key concept extraction from a document (*) and structuring the data plus metadata.

Output: the OntoChimp TermTable, a structured dictionary for each key concept.

Workflow 2 utilizes BioPortal Annotator and other resources to categorize the key concepts.

(* term clusters are also extracted but not yet processed.)

7 of 12

Workflow 2: Concept Categorization

  • Input: TermTable
  • Process:
  • Assign disposition codes via apps and manual review:
    • into = in target ontology (“solitude” already in PHASES ontology)
    • inoo = in another ontology
    • new = novel term
    • rej = rejected
    • Subcodes: inoo-exact, into-broad (KC is broader than term in TO)

  • Output: Curated ontology candidate list with disposition tags.

8 of 12

Example: Categorization Table

Unexpected finding: A missing Concept (in an existing ontology):

OntoChimp, via BioPortal Annotator, identified that “loneliness” was not in the Emotion Ontology MFOEM.

We appreciate the MFOEM maintainers for adding “loneliness” and “feeling lonely” to MFOEM shortly after notification.

 

Disposition

Matched

Onto.

 

Key Concept

Code

Onto.

ID

Comment

solitude

into-exact

PHASES

 

Term already planned for the Target Ontology (TO)

intervention

inoo-incl

BCIO

BCIO_037000

inoo in BCIO and selected as candidate for TO

introversion

inoo-incl

MFOEM

MF_0000026

inno in MFOEM and selected

maternal behavior

inoo-incl

MFOEM

GO_0042711

inno in MFOEM but ultimately from GO Gene Ontology

public policy

inoo-excl

BCIO

ADDICTO_0001109

inno but too general for use in TO

problem solve

generic

 

 

valid for domain but too generic for onto.

carl jung

scope

 

(example)

valid for domain but out of scope of onto; a person

(none)

nval

 

 

invalid for domain (so far, no KC is invalid!)

9 of 12

Observations & Early Results

  • 506 key concepts identified from 29 chapters.
  • Many novel or underrepresented terms found.
    • Terms result from what ChatGPT considers a key concept, not statistically based. Concept may only appear once but important.
  • Highlights conceptual richness of solitude-related research.
  • Modular workflows enable reuse in other domains.

10 of 12

Future Work

  • Incorporate term clusters into ontology reasoning.
  • Improve automation in disposition assignment.
    • Incorporate a disposition assignment log into the TermTable
  • Integrate outputs into the PHASES ontology on solitude.

OntoChimp code is in development at: github.com/smithGit/OntoChimp/

11 of 12

Summary

  • OntoChimp bridges AI text analysis and ontology engineering.
  • Produces a reproducible framework for ontology term discovery.
    • The TermTable data structure is useful for processing any set of multi-word terms
  • Supports scalable categorization and validation workflows.

12 of 12

Acknowledgements

  • Funding: U01 grant AG088074.

  • ICBO 2025 organizers and reviewers.
  • Collaborators and OntoChimp development contributors.
  • Team of the "Promoting Healthy Aging through Semantic Enrichment of Solitude Research" (PHASES) Project: John Beverley, William D. Duncan, Yongqun (Oliver) He, Julie Bowker, Hollen Reicher, Jie Zheng, Feng Yu Yeh, Jeremy Ravenel, B. Damayanthi Jesudas1, Rachel A. Mavrovich, Sean Kindya