1 of 24

Clinical Data Analysis Using OpenAI API

Korean Genome Organization | 27 Sep 2025

Seunggeun Lee

Clinical Data Analysis using OpenAI API

2 of 24

Contents

2

2

  1. Understanding APIs
  2. Advanced Topics
  3. Tutorial: Integrating OpenAI API with Python
  4. Clinical Note Analysis
  5. Building Gene Embeddings

Clinical Data Analysis using OpenAI API

3 of 24

1. Understanding APIs

3

3

Clinical Data Analysis using OpenAI API

4 of 24

1. Understanding APIs: What is an API

4

4

  • Application Programming Interface: A communication bridge that allows different software applications to interact with each other.
    • Client: The party that makes the request.
    • Server: The party that fulfills the request.
    • API: The intermediary that connects the two in a well-documented, predictable way.

Client

Server

Request

Output

Clinical Data Analysis using OpenAI API

https://www.oracle.com/kr/cloud/cloud-native/api-management/what-is-api/

5 of 24

1. Understanding APIs: Key Functions of an API

5

5

  • Data Exchange:
    • Enables data sharing between different applications.
  • Function Invocation (Execution):
    • Allows one application to call and use functions from another.
  • Service Integration:
    • Supports combining multiple services into a single application.

Clinical Data Analysis using OpenAI API

6 of 24

1. Understanding APIs: What is the OpenAI API

6

6

Clinical Data Analysis using OpenAI API

  • The OpenAI API allows developers to access AI models (like GPT-5, GPT-4, text-embedding) via simple requests.
    • Access to text, image, and audio models from one platform.
    • Flexible usage: content generation, chatbots, and more.

Provider

API Name

Main Models

OpenAI

OpenAI API

GPT-5

Google

Gemini API

Gemini 2.5 pro

Anthropic

Claude API

Claude Opus 4.1

/ Claude Sonnet 4

xAI

Grok API

Grok‑4

Major Generative AI APIs

7 of 24

1. Understanding APIs: Key Concepts - with Practical Understanding(1)

7

7

Clinical Data Analysis using OpenAI API

  1. Text Generation Models (GPT-5)
  2. Respond to prompts with generated text.
  3. Use cases: summarization, Q&A, code, chatbot
  4. Prompt design: how you instruct the model

OpenAI also provides embedding models.

2. Embeddings

  • Convert text into vector representations.
  • Similar meanings → closer vectors.
  • Used in: search, research use, etc.

https://platform.openai.com/docs/guides/text

https://platform.openai.com/docs/guides/embeddings

8 of 24

1. Understanding APIs: Key Concepts - with Practical Understanding(2)

8

8

Clinical Data Analysis using OpenAI API

3. Tokens

  • Units of text processed by models.
  • Pricing & length limits are based on token counts.

Misunderstanding tokens can lead to:

Unexpected charges, input length errors, or incomplete outputs.

https://platform.openai.com/tokenizer

9 of 24

1. Understanding APIs: OpenAI API Pricing

Graduate School of Data Science Master’s Thesis

9

9

Clinical Data Analysis using OpenAI API

10 of 24

1. Understanding APIs: How to use OpenAI API

Graduate School of Data Science Master’s Thesis

10

10

Clinical Data Analysis using OpenAI API

  1. Get Your API Key:
  2. Create an account at https://platform.openai.com
  3. Go to “API Keys” in the dashboards
  4. Click “Create new secret key”
  5. Copy and save the key somewhere safe

→ You won’t be able to view it again later

11 of 24

1. Understanding APIs: How to use OpenAI API

Graduate School of Data Science Master’s Thesis

11

11

Clinical Data Analysis using OpenAI API

https://platform.openai.com/chat/edit?models=gpt-4o

[Python Example with OpenAI API]

[OpenAI Platform Example with OpenAI API]

12 of 24

2. Advanced Topics

12

12

Clinical Data Analysis using OpenAI API

13 of 24

2. Advanced Topics: Retrieval-Augmented Generation (RAG)

Graduate School of Data Science Master’s Thesis

13

13

Clinical Data Analysis using OpenAI API

https://www.bentoml.com/blog/building-rag-with-open-source-and-custom-ai-models

14 of 24

2. Advanced Topics: Tool call

Graduate School of Data Science Master’s Thesis

14

14

Clinical Data Analysis using OpenAI API

https://python.langchain.com/docs/concepts/tool_calling/

15 of 24

2. Advanced Topics: What is an Agent?

15

Step 1: Doctor uploads Clinical notes

Step2: LangChain applies prompt template

Step 3: LLM processes unstructured data

Step 4: Output Summary

A: Clinical data

(EHR, labs, demographics)

B: Summarize Clinical Notes

(API call)

C. Extract Diagnoses

(API call)

If Low Risk:

Proceed to Discharge Report

Generate Discharge Report

(LLM API)

B. Extract Medications

(API call)

D. Extract Lab Finding

(API call)

E. Risk Stratification

If High Risk,

Generate Alert for Physician

If Medium Risk, Request Additional Test

Re-check Results

Final Discharge Summary

  • Simple API calls are not enough for complex pipelines
    • Researchers need a framework to connect data → reasoning → reporting seamlessly

16 of 24

2. Advanced Topics: Local LLM

Graduate School of Data Science Master’s Thesis

16

16

Clinical Data Analysis using OpenAI API

17 of 24

2. Tutorial (1): Clinical Note Analysis

Graduate School of Data Science Master’s Thesis

17

17

Clinical Data Analysis using OpenAI API

18 of 24

2. Tutorial (1): Understanding Clinical Notes

Graduate School of Data Science Master’s Thesis

18

18

Clinical Data Analysis using OpenAI API

  • Unstructured, narrative documentation created during clinical care.
  • Written by physicians, nurses, etc.
  • Vary in format: discharge summaries, progress notes, admission notes, etc.

https://www.mindbowser.com/how-to-improve-efficiency-when-writing-clinical-notes-in-ehr/

19 of 24

2. Tutorial (1): Styles and Characteristics of Clinical Notes

Graduate School of Data Science Master’s Thesis

19

19

Clinical Data Analysis using OpenAI API

  • Styles and structure can vary by institution, department, or even individual clinician.
  • Written under time pressure, leading to fragmented or telegraphic sentences.
  • Often include abbreviations, medical jargon, and shorthand expressions.
  • Include temporally evolving information(e.g., changes in symptoms, medication adjustments).

#. T2DM since 2018

PHx of CKD stage 3b (baseline Cr ~1.9)

C-peptide 2.2 (2022.1), GAD Ab (-)

Started metformin 2024.6 → lactic acidosis + hypoglycemia → 중단

AG metabolic acidosis on admission, resolved with IVF

#. HTN

BP stable on admission (90/54), later normalized with volume repletion

Amlodipine maintained

#. CKD

Cr 2.3 on admission (baseline ~1.9), improved to 1.7 by discharge

eGFR 32 → no dialysis needed

#. Depression

Sertraline 재시작, no suicidal ideation

Mood improved as delirium resolved

History of Present Illness:

___ female with history of T2DM, CKD stage 3, hypertension, and osteoarthritis who presented from skilled nursing facility with nausea, poor oral intake, and altered mental status per caregiver report.

Patient had reportedly been having progressive fatigue over the past week….

….

Past Medical History:

1. Type 2 Diabetes Mellitus

2. Chronic Kidney Disease Stage 3b (baseline Cr ~1.9)

3. Hypertension

4. Osteoarthritis (knees, spine)

5. Depression

….

Social History:

Family History:

Mother - diabetes, dementia

Father - unknown

Physical Exam on Admission:

Vitals: T 97.1 BP 90/54 HR 96 RR 20 SpO2 94% RA

Hospital Course:

# Metformin-Associated Lactic Acidosis:

Patient presented with lethargy, hypoglycemia, and high AG acidosis. Noted to have renal dysfunction and was recently started on metformin. Suspected metformin-associated lactic acidosis (MALA)....

# T2DM:

Initially hypoglycemic, requiring D50 and glucose monitoring. All antihyperglycemics held initially. Later transitioned to low-dose basal inslin.

# CKD:

Known baseline Cr ~1.9, presented with Cr 2.3. Likely prerenal component due to volume depletion. IVF improved renal function. Electrolytes monitored.

Discharge Medications:

1. Lantus 8 units SC QHS

2. Sertraline 50 mg PO QAM

3. Calcium carbonate 500 mg PO BID

Clinical Note Example 1

Clinical Note Example 2

20 of 24

2. Tutorial (1): The Importance of Clinical Note Analysis

Graduate School of Data Science Master’s Thesis

20

20

Clinical Data Analysis using OpenAI API

  • Data Extraction
  • Clinical Reasoning
  • Data harmonization/cleaning, Summarization, etc
  • Named Entity Recognition
  • Adverse Drug Reaction (ADR) Detection
  • Temporal Reasoning
  • Clinical Summarization
  • Symptom Progression Tracking
  • Causal Linkage Detection

Examples of Tasks via OpenAI API

Drug: Prednisone, Symptom: Melena

Melena after Pd 7.5mg qd” → Possible ADR

After starting metformin, patient developed nausea” → Time-linked causality

Convert fragmented notes into structured summaries

Detect worsening/improving symptoms over time

Likely due to statin” → Infer drug-outcome relationship

Tasks

Examples

21 of 24

2. Tutorial (2): Building Gene Embeddings

21

Clinical Data Analysis using OpenAI API

22 of 24

2. Tutorial (2): What is an embedding?

22

Clinical Data Analysis using OpenAI API

An embedding is a representation of information into numbers in a vector space that captures the original meaning.

Objects represented by similar vectors share more semantic meaning and are closer together - e.g. “liver” is closer to “lung” than it is to “bicycle.”

Embeddings are useful for tasks in which similarity or relationships must be measured, such as search and clustering.

23 of 24

2. Tutorial (2): What is a gene embedding?

23

Clinical Data Analysis using OpenAI API

  • A gene embedding: represents each gene as a numerical vector.
  • Genes with similar roles, co-expression patterns or biological pathways are mapped closer to each other
  • The embedding is learned from various data, including genomics, transcriptomics and other literature KBs

Gene2vec: distributed representation of genes based on co-expression�(Du, J. et al, 2019)

24 of 24

2. Tutorial (2): Make gene-embedding using GPT (GenePT)

24

Clinical Data Analysis using OpenAI API

https://www.biorxiv.org/content/10.1101/2023.10.16.562533v2