1 of 2

To Train, or Not to Train: Exploring �Foundation Models for TBAD Segmentation

Ayman Abaid¹ and Ihsan Ullah¹,²

1School of Computer Science, University of Galway, Galway, Ireland.

2Insight SFI Research Centre for Data Analytics, University of Galway, Ireland

Background & Motivation

Type B aortic dissection (TBAD) is a critical cardiovascular condition marked by a tear in the descending aorta, where rapid and accurate lumen segmentation is crucial for effective diagnosis and prognosis. Manual segmentation is labor-intensive and costly, underscoring the need for improved diagnostic methods to boost clinical outcomes. While advances in deep learning have significantly improved medical image segmentation, developing models for new datasets remains challenging due to the need for precise annotations. Vision foundation models (FMs) may offer promising solutions to these challenges.

  1. Yao, Z., Xie, W., Zhang, J., Dong, Y., Qiu, H., Yuan, H., ... & Huang, M. (2021). Imagetbad: A 3d computed tomography angiography image dataset for automatic segmentation of type-b aortic dissection. Frontiers in Physiology, 12, 732711.
  2. Butoi, V. I., Ortiz, J. J. G., Ma, T., Sabuncu, M. R., Guttag, J., and Dalca, A. V. (2023). Universeg: Universal medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21438–21451
  3. Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. (2023). Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026
  4. Ma, J., He, Y., Li, F., Han, L., You, C., and Wang, B. (2024). Segment anything in medical images. Nature Communications, 15(1):654.

Research Questions

  1. Can state-of-the-art (SoTA) segmentation FMs fully replace traditional architectures like UNet?
  2. What are the limitations of FMs in multi-label segmentation for medical imaging?
  3. Can fine-tuning FMs with medical data enhance their clinical performance?

Methodology

We utilized the ImagedTBAD [1] dataset, which consists of 100 CTA volumes from patients diagnosed with TBAD. We evaluated three foundational models:

  • Segment Anything Model (SAM) [2], trained on natural images.
  • MedSAM [3], a variant of SAM fine-tuned on medical data.
  • UniverSeg [4], trained exclusively on medical data.

Experiments and Results

Key Findings

  1. FMs trained on natural images fail to outperform SoTA segmentation models, especially when contrast is lower as on TBAD CTA images.
  2. FMs trained on medical data perform slightly better, in comparison to those fine-tuned on medical data.
  3. SAM and MedSAM can only detect one ROI per prompt and lack multi-label segmentation, limiting their use in complex medical Imaging tasks.
  4. FMs has the potential to make a significant impact, but appropriate care needs to be applied when using it.

In our experiments, we evaluated models on TBAD data using different prompt types and settings. For SAM, we tested point and bounding box (BB) prompts. MedSAM was assessed with BB prompts, while UniverSeg was tested with varying support set sizes (16, 32, and 64). A UNet model trained on the same dataset served as a baseline. All models were tested on the same dataset for direct comparison.

Zero Shot Learning

Few Shot Learning

Table 1: Performance comparison of UniverSeg, MedSAM, and UNet on CTA image segmentation

Figure 1: Comparison of input images, ground truth (GT), and predictions by UNet and three FMs. (a) Input image and GT. (b) Prediction by UNet. (c) Prediction by MedSAM using input images with BB coordinates. (d) Prediction by UniverSeg with a support set size of 64. (e) Prediction by SAM using BB coordinates. (f) Prediction by SAM using point coordinates. In the predicted masks and GT, blue regions represent False Lumen Thrombus, red regions denote False Lumen, and green regions correspond to True Lumen.

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ire-land under Grant number [SFI/12/RC/2289_P2] the Insight SFI Research Centre for Data Analytics. Additionally, this publication has emanated from research conducted with the financial support of Taighde Éireann – Research Ireland under Grant No. 18/CRT/6223.

References

Foundation Model Approach

Traditional Approach

  • Qualitative Results
  • Quantitative Results

* This paper has been accepted for presentation at the Irish Machine Vision and Image Processing (IMVIP) Conference 2024.

Paper Link

2 of 2