To Train, or Not to Train: Exploring �Foundation Models for TBAD Segmentation
Ayman Abaid¹ and Ihsan Ullah¹,²
1School of Computer Science, University of Galway, Galway, Ireland.
2Insight SFI Research Centre for Data Analytics, University of Galway, Ireland
Background & Motivation
Type B aortic dissection (TBAD) is a critical cardiovascular condition marked by a tear in the descending aorta, where rapid and accurate lumen segmentation is crucial for effective diagnosis and prognosis. Manual segmentation is labor-intensive and costly, underscoring the need for improved diagnostic methods to boost clinical outcomes. While advances in deep learning have significantly improved medical image segmentation, developing models for new datasets remains challenging due to the need for precise annotations. Vision foundation models (FMs) may offer promising solutions to these challenges.
Research Questions
Methodology
We utilized the ImagedTBAD [1] dataset, which consists of 100 CTA volumes from patients diagnosed with TBAD. We evaluated three foundational models:
Experiments and Results
Key Findings
In our experiments, we evaluated models on TBAD data using different prompt types and settings. For SAM, we tested point and bounding box (BB) prompts. MedSAM was assessed with BB prompts, while UniverSeg was tested with varying support set sizes (16, 32, and 64). A UNet model trained on the same dataset served as a baseline. All models were tested on the same dataset for direct comparison.
Zero Shot Learning
Few Shot Learning
Table 1: Performance comparison of UniverSeg, MedSAM, and UNet on CTA image segmentation
Figure 1: Comparison of input images, ground truth (GT), and predictions by UNet and three FMs. (a) Input image and GT. (b) Prediction by UNet. (c) Prediction by MedSAM using input images with BB coordinates. (d) Prediction by UniverSeg with a support set size of 64. (e) Prediction by SAM using BB coordinates. (f) Prediction by SAM using point coordinates. In the predicted masks and GT, blue regions represent False Lumen Thrombus, red regions denote False Lumen, and green regions correspond to True Lumen.
Acknowledgements
This publication has emanated from research conducted with the financial support of Science Foundation Ire-land under Grant number [SFI/12/RC/2289_P2] the Insight SFI Research Centre for Data Analytics. Additionally, this publication has emanated from research conducted with the financial support of Taighde Éireann – Research Ireland under Grant No. 18/CRT/6223.
References
Foundation Model Approach
Traditional Approach
* This paper has been accepted for presentation at the Irish Machine Vision and Image Processing (IMVIP) Conference 2024.
Paper Link