Role of ChatGPT in OBO Ontology Development
Sierra Moxon
Lawrence Berkeley National Lab
ICBO Ontologies Tutorial: August 2023
smoxon@lbl.gov
Helpful resources
https://learn.deeplearning.ai/chatgpt-prompt-eng - great guided tutorial, some code examples.
Thanks!
Especially to: J. Harry Caufield, Marcin Joachimiak, Justin Reese, Harshad Hegde, Chris Mungall
How did I get started?
Key to learning a new tool is play.
Try to trick the tool into making ridiculously hallucinated disorders and diseases.
“kool-aid man syndrome is a…”
How do I use LLMs/Machine Learning daily?
LLMs
Machine Learning
Prompt Engineering: Talking to an extremely well-read teenageer.
DEMO: check algorithm results quickly
DEMO: find all the obsolete terms in an ontology
“find me all the obsolete terms in MONDO disease ontology”
DEMO
given a list of terms separated by commas (the list is surrounded by triple ticks) : ```Cell, Neuron, Hippocampus,Microarray``` I want you to use lexical matching to find potential matches to existing ontology terms.
Add ClinGen xrefs to MONDO
“using the mondo disease ontology json file, can you return, in a simple table, a list of mondo term ids and labels that do not have ClinGen xrefs”
“Can you please make teh same table, but I only want IDs that have the string "MONDO" in them”
“I just uploaded another file, the clingen curation activity summary report. I see some MONDO ids in this file. Can we use this file with the table you just generated (the full table, not just a few examples that you printed on the screen), to generate a mapping table linking MONDO ids that do not have clingen xrefs with the xrefs in the "disease_url" column of this new file?”
“the id attribute the object in the mondo json file has a full URL, in order to map the id in the mondo file to the id in the fifth column of the clingen file, we need to extract the id from the url in the mondo json file. we should extract everything after (and including the string MONDO) in the id field of the json file, and replace the "_" with ":" -- once we have these transformed id, we can map it to the clingen file”
“can you show me examples of this table?”
“instead of showing me "URL for MONDO:..." just show me the full url in the table”
Write clear and specific instructions
ontogpt
ontogpt: https://github.com/monarch-initiative/ontogpt/tree/main/src/ontogpt/
Use a schema to ask for results in a particular format, feed in some text, and use some code to ground the results in existing ontology terms.
https://github.com/monarch-initiative/ontogpt/blob/main/src/ontogpt/templates/mendelian-disease.yaml
% ontogpt extract -t mendelian_disease.MendelianDisease dentin.txt
Create new ontology terms?
Curate-GPT
curate-gpt: https://github.com/monarch-initiative/curate-gpt
Limitations and gotchas