1 of 27

MEDIFFUSE

Authors:​

  1. Saurav Dosi
  2. Animesh Maheshwari
  3. Pratiksha Aigal
  4. Varad Abhyankar

Transforming Medical Scans with Diffusion-powered AI

Advised by Prof Yapeng Tian

2 of 27

01

Problem Overview

02

03

Medical imaging is critical for accurate disease diagnosis and treatment.

CT scans are fast and cost-effective but offer poor soft-tissue contrast.

There's a need to combine the speed of CT with the diagnostic power of MRI.

This project uses diffusion models to generate MRI-like images from CT scans.

MRI provides superior soft-tissue visualization but is expensive and slow.

Bridge the gap between CT and MRI while preserving key anatomical details.

Need

Approach

Goal

3 of 27

Methodology

01

4 of 27

Data & Preprocessing

  • We have used the CT-MRI scans from the SynthRad2023 Challenge dataset [1].​
  • Each CT & MRI scan was processed to extract central axial slices.​
  • The extracted slices were resized to a uniform 512×512 resolution.​
  • We also generated contour maps using Canny Edge Detection from MRI scans to condition the training of ControlNet.​

  • NIfTI (Neuroimaging Informatics Technology Initiative) compressed file -> extracted image slices and contours.

5 of 27

Models Trained

We have trained a ControlNet [2] to generate MRI scans using spatial conditioning to guide the diffusion process, preserving anatomical structures. We used MRI contours as guidance to improve spatial fidelity during CT-to-MRI translation.​

ControlNet

We have also fine-tuned InstructPix2Pix [3], an image modifying diffusion model, adapted to generate MRI-like output from CT input images

InstructPix2Pix

6 of 27

Model Architecture

02

7 of 27

Stable Diffusion

Stable Diffusion [4] is a latent diffusion model that generates images by iteratively denoising a latent representation guided by a text prompt. It projects images into a lower-dimensional latent space for computational efficiency, then refines them through a learned denoising UNet. For image-to-image tasks the model can be conditioned on an input image and a guiding prompt.

(Image: Stable Diffusion Clearly Explained!)

8 of 27

ControlNet

ControlNet is an extension of Stable Diffusion that adds explicit spatial control over the image generation process. It introduces additional input conditions—like edges, depth maps, or annotated CT images—that guide the denoising steps in a parallel branch fused with the main UNet. This allows precise alignment between the input image and the generated output, making it ideal for medical applications where anatomical accuracy is crucial.

(Image: Stable Diffusion — ControlNet Clearly Explained!)

9 of 27

InstructPix2Pix

InstructPix2Pix extends Stable Diffusion by enabling instruction-based image editing using natural language. Trained on triplets of (input image, instruction, output image), it learns to follow high-level editing commands. It supports Img2Img generation while responding dynamically to user-specified textual modifications.

(Image: Image Editing with InstructPix2Pix and OpenVIN)

10 of 27

Experimental Setup

  1. Dataset - Central axial slices from paired CT-MRI scans from the SynthRad2023 Challenge dataset split into 2398 train image pairs and 126 test image pairs.
  2. Preprocessing - Intensity normalized, contours generated using Canny edge detection for ControlNet.
  3. Backbone - Stable Diffusion 1.5
  4. ControlNet on Canny edges extracted from CT scans
  5. InstructPix2Pix on CT scans
  6. Training Steps - a) ControlNet - 35k b) InstructPix2Pix - 15k
  7. Batch Size - 2-4
  8. Hardware - 2x A5000 GPUs
  9. Evaluation metrics - PSNR (db), SSIM, average inference time on test dataset.

11 of 27

Results & Evaluation

02

12 of 27

Pop Quiz

Zero Shot Stable Diffusion 1.5

ControlNet

InstructPix2Pix

Real

13 of 27

Generated Images (ControlNet)

CT scan

Contours

Generated MRI

GT MRI

14 of 27

Generated Images (InstructPix2Pix)

CT scan

Generated MRI

GT MRI

15 of 27

Generated Images (InstructPix2Pix)

MRI scan

Generated CT

GT CT

16 of 27

Evaluation Metrics

SSIM is used to measure the perceptual similarity between the generated MRI image and the ground truth MRI image by comparing luminance, contrast, and structural information.​

Structural Similarity Index Measure

PSNR quantifies the reconstruction quality by measuring the pixel-wise error between the generated and real MRI images.​

Peak Signal-to-Noise Ratio

17 of 27

Results

Train set: 2398 image pairs

Test set: 126 image pairs

18 of 27

Metrics Visualization

Model wise comparison of PSNR, SSIM & Inference Time

19 of 27

Metrics Visualization

PSNR vs SSIM (colored by Inference Time)

Instructpix2pix

ControlNet

20 of 27

Mediffuse: In a nutshell

21 of 27

Mediffuse: Anomaly Detection

Subtle Calcination Case: Mediffuse detected a lesion in MRI where CT showed none.�

Model Insight: Can infer pathology absent on CT.�

Next Steps: Add diverse scans and metadata (field strength, slice spacing, CLIP embeddings).�

Try It: See results/ or launch the live demo.

22 of 27

VIDEO DEMO

23 of 27

24 of 27

Future work

25 of 27

  • 3D DICOM Input & Volumetric Processing: Support full 3D DICOM series to capture spatial context and enable volumetric lesion tracking.
  • Patient Metadata Integration: Condition on clinical variables (age, history, biomarkers, lab results, genetic profiles) to establish robust anomaly priors.
  • Advanced Multi‑Modal Conditioning: Fuse PET, diffusion MRI, functional CT, and structural contours through cross‑modal attention for richer pathology context.
  • Uncertainty‑Aware Synthesis: Embed Bayesian layers or MC‑dropout to generate voxel‑level confidence maps and flag low‑confidence regions.
  • Federated Learning for Scanner Diversity: Train across multiple institutions without moving data to capture hardware variability while preserving privacy.
  • Active Learning & Human‑in‑the‑Loop: Use radiologist annotations on challenging cases to continuously fine‑tune the model and reduce hallucinations.
  • Lightweight Student Model: Apply knowledge distillation plus pruning and quantization to deliver a sub‑100 MB model for real‑time CPU/GPU inference.
  • End‑to‑End Clinical Validation: Automate SSIM/PSNR benchmarking on held‑out cohorts and run reader studies to assess diagnostic accuracy before deployment.

Future work

26 of 27

References

27 of 27

THANKS!

Open for QA!