1 of 1

ABSTRACT

RESEARCH HIGHLIGHTS

CONTACT

Automatic Segmentation Of Nasopharyngeal Carcinoma On

CT Images Using Efficient UNet 2.5d Ensemble With Semi-supervised Pretext Task Training

Domoguen JKL, Manuel JA, Cañal JPA and Naval PC Jr (2022) Automatic segmentation of nasopharyngeal carcinoma on CT images using efficient UNet‐2.5D ensemble with semi‐ supervised pretext task pretraining. Front. Oncol. 12:980312. doi: 10.3389/fonc.2022.980312

Nasopharyngeal carcinoma (NPC) is primarily treated with radiation therapy. Accurate delineation of target volumes and organs at risk is critical. However, manual delineation is time-consuming, variable and subjective depending on the experience of the radiation oncologist. This work explores the use of deep learning methods to automate the segmentation of NPC primary gross tumor volume (GTVp) in planning computed tomography images. A total of 63 patients diagnosed with NPC were included in this study. To tackle this problem of limited data, we propose 2 sequential approaches. First, we propose a much simpler architecture which follows the Unet design but using 2D convolutional network for 3D segmentation. We highlight its efficacy over other more popular and modern architecture by achieving significantly higher performance. To further improve performance, we trained the model using multi-scale dataset to create an ensemble of models. Building on top of this proposed architecture, we employ the use of semi-supervised learning by combining 3D rotation and 3D relative-patch location pre-text tasks to pretrain the feature extractor. By semi-supervised pretraining, the feature extractor can be frozen after pretraining, which essentially makes it more efficient in terms of the number of parameters. Finally, it is also efficient in terms of data, seen when the pretrained model with only a portion of the labelled training data was able to achiever very close performance to the model trained with the full labelled data.

  • DICOM images could not be used for training, but their extracted array volumes could. Data was augmented via rotation, flipping, cropping and transposition. Multi-scale training was done.
  • Analysis showed that UNet2.5D had better Dice similarity coefficient (DSC), Intersection over union (IOU) and sensitivity.
  • Performance by positive predictive value (PPV), relative vol. error (RVE), average symmetric surface distance (ASSD) and Hausdorff is above average.
  • Compared to other architecture, UNet2.5D is able to significantly outperform others in segmenting NPC. Our methods are 4x more efficient in terms of number of parameters and amount of data required.

Joie Cañal, MD – jdadevosocanal@up.edu.ph, +639178387569

Training images: normal, unlabeled and labelled

Data augmentation: rotation, flipping and transposition

Multi-scale training