�MEDIFFUSE
Authors:
Transforming Medical Scans with Diffusion-powered AI
Advised by Prof Yapeng Tian
01
Problem Overview
02
03
Medical imaging is critical for accurate disease diagnosis and treatment.
CT scans are fast and cost-effective but offer poor soft-tissue contrast.
There's a need to combine the speed of CT with the diagnostic power of MRI.
This project uses diffusion models to generate MRI-like images from CT scans.
MRI provides superior soft-tissue visualization but is expensive and slow.
Bridge the gap between CT and MRI while preserving key anatomical details.
Need
Approach
Goal
Methodology
01
Data & Preprocessing
Models Trained
We have trained a ControlNet [2] to generate MRI scans using spatial conditioning to guide the diffusion process, preserving anatomical structures. We used MRI contours as guidance to improve spatial fidelity during CT-to-MRI translation.
ControlNet
We have also fine-tuned InstructPix2Pix [3], an image modifying diffusion model, adapted to generate MRI-like output from CT input images
InstructPix2Pix
Model Architecture
02
Stable Diffusion
Stable Diffusion [4] is a latent diffusion model that generates images by iteratively denoising a latent representation guided by a text prompt. It projects images into a lower-dimensional latent space for computational efficiency, then refines them through a learned denoising UNet. For image-to-image tasks the model can be conditioned on an input image and a guiding prompt.
ControlNet
ControlNet is an extension of Stable Diffusion that adds explicit spatial control over the image generation process. It introduces additional input conditions—like edges, depth maps, or annotated CT images—that guide the denoising steps in a parallel branch fused with the main UNet. This allows precise alignment between the input image and the generated output, making it ideal for medical applications where anatomical accuracy is crucial.
InstructPix2Pix
InstructPix2Pix extends Stable Diffusion by enabling instruction-based image editing using natural language. Trained on triplets of (input image, instruction, output image), it learns to follow high-level editing commands. It supports Img2Img generation while responding dynamically to user-specified textual modifications.
Experimental Setup
Results & Evaluation
02
Pop Quiz
Zero Shot Stable Diffusion 1.5
ControlNet
InstructPix2Pix
Real
Generated Images (ControlNet)
CT scan
Contours
Generated MRI
GT MRI
Generated Images (InstructPix2Pix)
CT scan
Generated MRI
GT MRI
Generated Images (InstructPix2Pix)
MRI scan
Generated CT
GT CT
Evaluation Metrics
SSIM is used to measure the perceptual similarity between the generated MRI image and the ground truth MRI image by comparing luminance, contrast, and structural information.
Structural Similarity Index Measure
PSNR quantifies the reconstruction quality by measuring the pixel-wise error between the generated and real MRI images.
Peak Signal-to-Noise Ratio
Results
Train set: 2398 image pairs
Test set: 126 image pairs
Metrics Visualization
Model wise comparison of PSNR, SSIM & Inference Time
Metrics Visualization
PSNR vs SSIM (colored by Inference Time)
Instructpix2pix
ControlNet
Mediffuse: In a nutshell
Mediffuse: Anomaly Detection
Subtle Calcination Case: Mediffuse detected a lesion in MRI where CT showed none.�
Model Insight: Can infer pathology absent on CT.�
Next Steps: Add diverse scans and metadata (field strength, slice spacing, CLIP embeddings).�
Try It: See results/ or launch the live demo.
VIDEO DEMO
Future work
Future work
References
THANKS!
Open for QA!