1 of 32

LLM fine-tuning

with medical records 이론/실습

2025-10-23, 15:00 ~ 17:00

Seongsu Bae, Sujeong Im

KAIST AI @ Edlab (Advised by Edward Choi)

KoSAIM 2025 개발자를 위한 AI 실습교육: Train your own medical AI

2 of 32

Speaker Bio

Sujeong Im (임수정)

Education

  • POSTECH Creative IT Engineering, B.Sc�(2018-2022)
  • KAIST Kim Jaechul Graduate School of AI, M.Sc (2023-2025)
  • KAIST Kim Jaechul Graduate School of AI, Ph.D (2025-)

Research Interests

  • Foundation Model
  • Natural Language Processing
  • Machine Learning for Healthcare

Seongsu Bae (배성수)

Education

  • Hanyang University Mathematics, B.Sc (2013-2019)
  • KAIST Kim Jaechul Graduate School of AI, M.Sc (2020-2022)
  • KAIST Kim Jaechul Graduate School of AI, Ph.D (2022-)

Research Interests

  • Semantic Machine
  • Multimodal Learning
  • Machine Learning for Healthcare

3 of 32

Table of Contents

  • How to build a clinical domain Large Language Model (LLM)? (40 mins)
    • (Large) Language Model
    • How to build a (large) language model?
    • Building an instruction-following LLM in the clinical domain
    • Asclepius (Gweon and Kim et al., ACL 2024 Findings)�
  • Hands-on Session: Fine-tuning a clinical domain LLM (80 mins)
    • Environment Setup & Colab Practice
    • LLM memory layout
    • Parameter-Efficient Fine-Tuning (LoRA/QLoRA)

4 of 32

Language Model

5 of 32

We deal with LMs every day!

6 of 32

How to train a LM?

The

The

sky

is

blue

.

The

sky

is

The

sky

blue

is

sky

Next Token Prediction task for the sentence “The sky is blue.”

7 of 32

Text Generation via a Probabilistic Model

The

sky

is

blue

clear

usually

the

<

(Large)

Language Model

more likely

less likely

8 of 32

How to build a (large) language model?

  • Pre-training and Fine-tuning
    • e.g., BERT (2018), T5 (2019)

Pretrained

LM

Finetune on task A

Finetune on task B

Finetune on task C

Inference

on task A

Inference

on task B

Inference

on task C

(-) Task-specific training → One specialized model for each task

9 of 32

How to build a (large) language model?

  • Pre-training and Prompting
    • e.g., GPT-3 (2020)

Pretrained

LM

Inference

on task A

Inference

on task B

Inference

on task C

(+) Improve performance via few-shot prompting or prompt engineering

prompting

10 of 32

How to build a (large) language model?

  • Pre-training and Prompting

(-) Forced few-shot prompting

(-) Manual efforts for the prompting technique

(-) Not aligned with natural instructions

11 of 32

How to build a (large) language model?

  • Pre-training and Instruction tuning
    • Supervised Fine-Tuning (SFT) on instruction data
    • e.g., FLAN (2021), LLaMA (2023)

Pretrained

LM

Inference

on task A

Inference

on task B

Inference

on task C

(+) model learns to perform many tasks via natural language instructions

instructions

fine-tune on many instructions

12 of 32

How to build a (large) language model?

  • Pre-training and Alignment tuning
    • Supervised Fine-Tuning (SFT) on instruction data�+ Alignment learning on preference data (e.g., RLHF, DPO)
    • e.g., InstructGPT (2022), ChatGPT (2022), Llama 2 (2023), Llama 3 (2024)

13 of 32

Building an instruction-following LLM

  • How can we build an instruction-following LLM?
    • Prepare a pre-trained large language model (e.g., LLaMA 7B)
    • Perform supervised fine-tuning on instruction data (e.g., Alpaca 52K dataset)��
  • How can we build an instruction-following LLM in the clinical domain?
    • Prepare a pre-trained large language model
    • Pre-training on clinical corpus for domain adaptation
    • Perform supervised fine-tuning using domain-specific clinical instruction data
      • Today, we will focus on instruction-following data tailored for clinical notes!

14 of 32

Imagine a clinical LLM

  • Given a clinical note, a clinical LLM can perform these tasks as follows:
    • “What medical procedures were performed on the patient during her hospital course, as mentioned in the discharge summary?” Named Entity Recognition
    • “What abbreviation was expanded using the acronym ‘ANH’ in the diagnosis section of the discharge summary?” Abbreviation Expansion
    • “When was the patient started on oral acyclovir and what was the duration of treatment?” Temporal Information Extraction
    • “Can you summarize the patient’s hospital course, treatment, and diagnoses according to the given discharge summary?” Summarization
    • “What was the reason for the patient’s transfer to ICU and what was the treatment plan for infection-induced respiratory failure?” Question Answering

15 of 32

Asclepius: Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes (Gweon and Kim et al., ACL 2024 Findings)

16 of 32

Real clinical note

  • Semi-Structured Text about Patient Activity
  • Properties
    • Semi-structured: Associated with headers
    • Acronyms
    • Typos
  • Problem: Protected Health Information (PHI)
    • Use GPT: PHI ⇒ Impractical
    • Human Annotation: Require Experts ⇒ cost
    • Machine Annotation: PHI ⇒ Impractical

17 of 32

Case report

  • To share “case” with community
    • No PHI ⇒ Sharable
  • Properties
    • Plain text
    • Less acronyms
    • Well-written
  • Contents are similar to the notes
  • e.g., PMC (PubMed Central) case report

18 of 32

Synthetic clinical note generation

19 of 32

Clinical instruction/response data generation

20 of 32

Final dataset

  • (clinical note, instruction, response) triples ⇒ all synthetics!

21 of 32

Asclepius-Llama3-8B

  • How can we build an instruction-following LLM in the clinical domain?
    • Prepare a pre-trained large language model
      • use Llama3-8B model
    • Pre-training on clinical corpus for domain adaptation
      • Pre-training (1 epoch): 2h 59m with 4x A100 80G
      • dataset: synthetic clinical notes
    • Perform supervised fine-tuning using domain-specific clinical instruction data
      • Instruction fine-tuning (3 epoch): 30h 41m with 4x A100 80G
      • dataset: clinical instruction-response pairs with synthetic clinical notes

22 of 32

Hands-on Session:

Fine-tuning a clinical domain LLM

23 of 32

Environment Setup

colab link

24 of 32

Environment Setup

25 of 32

Environment Setup

26 of 32

Colab Objectives

  • Goal: Fine-tuning a clinical domain LLM
  • Environment: Google Colab
  • Dataset: starmpcc/Asclepius-Synthetic-Clinical-Notes
  • Model: microsoft/phi-2 (2.7B)
  • CAUTION (주의)
    • LLM 학습하는 과정에서 Colab을 절대 끄지 마시기 바랍니다.
      • 새로고침 금지
      • 코랩 내에서 다른 버튼 클릭 금지
      • 실행 중지 금지

27 of 32

Deep learning memory layout

  • Model size: B (billion) scale
    • xB parameters = xB floating point numbers = 2x GB (bf16/fp16)
  • Deep Learning Memory Requirements
    • model parameter: 2x GB
    • gradient state: 2x GB
    • optimizer state: 2x ~ 12x GB
    • Total: 6~16x GB + alpha
  • Our requirements
    • model: phi-2 (2.7B)
    • GPU VRAM: Colab T4 (16GB)
    • 2.7*6=16.2

28 of 32

Can You Run it?

29 of 32

LoRA (Hu and Shen et al., 2021)

30 of 32

QLoRA (Dettmers and Pagnoni et al., 2023)

31 of 32

Parameter-Efficient Fine-Tuning (PEFT)

32 of 32

Thank you :D

If you require any further information, feel free to contact us: seongsu@kaist.ac.kr, sujeongim@kaist.ac.kr

KoSAIM 2025 개발자를 위한 AI 실습교육: Train your own medical AI