Kulkarni Chaitanya, Sherkhane Umesh, Jaiswar Vinay, Mithun Sneha, Mysore Siddu Dinesh, Rangarajan Venkatesh, Dekker Andre, Traverso Alberto, Jha Ashish, Wee Leonard
Philips Research, Philips Innovation Campus, Bengaluru, Karnataka 560045, India.
Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht 6229 ET, The Netherlands.
BJR Open. 2023 Dec 12;6(1):tzad008. doi: 10.1093/bjro/tzad008. eCollection 2024 Jan.
Radiation therapy for lung cancer requires a gross tumour volume (GTV) to be carefully outlined by a skilled radiation oncologist (RO) to accurately pinpoint high radiation dose to a malignant mass while simultaneously minimizing radiation damage to adjacent normal tissues. This is manually intensive and tedious however, it is feasible to train a deep learning (DL) neural network that could assist ROs to delineate the GTV. However, DL trained on large openly accessible data sets might not perform well when applied to a superficially similar task but in a different clinical setting. In this work, we tested the performance of DL automatic lung GTV segmentation model trained on open-access Dutch data when used on Indian patients from a large public tertiary hospital, and hypothesized that DL performance could be improved for a specific clinical context, by means of modest transfer-learning on a small representative local subset.
X-ray computed tomography (CT) series in a public data set called "NSCLC-Radiomics" from The Cancer Imaging Archive was first used to train a DL-based lung GTV segmentation model (Model 1). Its performance was assessed using a different open access data set (Interobserver1) of Dutch subjects plus a private Indian data set from a local tertiary hospital (Test Set 2). Another Indian data set (Retrain Set 1) was used to fine-tune the former DL model using a transfer learning method. The Indian data sets were taken from CT of a hybrid scanner based in nuclear medicine, but the GTV was drawn by skilled Indian ROs. The final (after fine-tuning) model (Model 2) was then re-evaluated in "Interobserver1" and "Test Set 2." Dice similarity coefficient (DSC), precision, and recall were used as geometric segmentation performance metrics.
Model 1 trained exclusively on Dutch scans showed a significant fall in performance when tested on "Test Set 2." However, the DSC of Model 2 recovered by 14 percentage points when evaluated in the same test set. Precision and recall showed a similar rebound of performance after transfer learning, in spite of using a comparatively small sample size. The performance of both models, before and after the fine-tuning, did not significantly change the segmentation performance in "Interobserver1."
A large public open-access data set was used to train a generic DL model for lung GTV segmentation, but this did not perform well initially in the Indian clinical context. Using transfer learning methods, it was feasible to efficiently and easily fine-tune the generic model using only a small number of local examples from the Indian hospital. This led to a recovery of some of the geometric segmentation performance, but the tuning did not appear to affect the performance of the model in another open-access data set.
Caution is needed when using models trained on large volumes of international data in a local clinical setting, even when that training data set is of good quality. Minor differences in scan acquisition and clinician delineation preferences may result in an apparent drop in performance. However, DL models have the advantage of being efficiently "adapted" from a generic to a locally specific context, with only a small amount of fine-tuning by means of transfer learning on a small local institutional data set.
肺癌放射治疗需要由经验丰富的放射肿瘤学家(RO)仔细勾勒出大体肿瘤体积(GTV),以便准确地将高辐射剂量指向恶性肿块,同时尽量减少对相邻正常组织的辐射损伤。然而,这是一项人工密集且繁琐的工作,训练一个能够协助放射肿瘤学家勾勒GTV的深度学习(DL)神经网络是可行的。然而,在大型公开可用数据集上训练的深度学习模型,应用于表面上类似但临床环境不同的任务时,可能表现不佳。在这项研究中,我们测试了在公开获取的荷兰数据上训练的DL自动肺GTV分割模型,应用于一家大型公立三级医院的印度患者时的性能,并假设通过在一个小的代表性本地子集中进行适度的迁移学习,可以针对特定临床环境提高DL的性能。
首先使用来自癌症影像存档库中一个名为“NSCLC - 放射组学”的公共数据集中的X射线计算机断层扫描(CT)序列,训练一个基于DL的肺GTV分割模型(模型1)。使用另一个荷兰受试者的公开可用数据集(Interobserver1)以及一家本地三级医院的印度私有数据集(测试集2)评估其性能。使用迁移学习方法,利用另一个印度数据集(再训练集1)对前一个DL模型进行微调。印度数据集取自基于核医学的混合扫描仪的CT图像,但GTV由经验丰富的印度放射肿瘤学家勾画。然后在“Interobserver1”和“测试集2”中重新评估最终(微调后)的模型(模型2)。使用骰子相似系数(DSC)、精度和召回率作为几何分割性能指标。
仅在荷兰扫描数据上训练的模型1,在“测试集2”上测试时性能显著下降。然而,在同一测试集中评估时,模型2的DSC恢复了14个百分点。尽管使用的样本量相对较小,但迁移学习后精度和召回率也有类似的性能反弹。在“Interobserver1”中,微调前后两个模型的分割性能均无显著变化。
使用一个大型公共开放获取数据集训练了一个用于肺GTV分割的通用DL模型,但该模型最初在印度临床环境中表现不佳。使用迁移学习方法,仅使用来自印度医院的少量本地示例就可以高效且轻松地对通用模型进行微调。这导致了一些几何分割性能的恢复,但这种调整似乎并未影响该模型在另一个公开可用数据集中的性能。
在本地临床环境中使用基于大量国际数据训练的模型时需要谨慎,即使该训练数据集质量良好。扫描采集和临床医生勾画偏好的微小差异可能导致性能明显下降。然而,DL模型具有优势,即仅通过在一个小的本地机构数据集上进行迁移学习进行少量微调,就可以有效地从通用模型“适应”到本地特定环境。