Konkuk University School of Medicine, South Korea.
Department of Medical Physics at Memorial Sloan Kettering Cancer Center, USA.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa336.
Thyroid nodules are neoplasms commonly found among adults, with papillary thyroid carcinoma (PTC) being the most prevalent malignancy. However, current diagnostic methods often subject patients to unnecessary surgical burden. In this study, we developed and validated an automated, highly accurate multi-study-derived diagnostic model for PTCs using personalized biological pathways coupled with a sophisticated machine learning algorithm. Surprisingly, the algorithm achieved near-perfect performance in discriminating PTCs from non-tumoral thyroid samples with an overall cross-study-validated area under the receiver operating characteristic curve (AUROC) of 0.999 (95% confidence interval [CI]: 0.995-1) and a Brier score of 0.013 on three independent development cohorts. In addition, the algorithm showed excellent generalizability and transferability on two large-scale external blind PTC cohorts consisting of The Cancer Genome Atlas (TCGA), which is the largest genomic PTC cohort studied to date, and the post-Chernobyl cohort, which includes PTCs reported after exposure to radiation from the Chernobyl accident. When applied to the TCGA cohort, the model yielded an AUROC of 0.969 (95% CI: 0.950-0.987) and a Brier score of 0.109. On the post-Chernobyl cohort, it yielded an AUROC of 0.962 (95% CI: 0.918-1) and a Brier score of 0.073. This algorithm also is robust against other various types of clinical scenarios, discriminating malignant from benign lesions as well as clinically aggressive thyroid cancer with poor prognosis from indolent ones. Furthermore, we discovered novel pathway alterations and prognostic signatures for PTC, which can provide directions for follow-up studies.
甲状腺结节是成年人中常见的肿瘤,其中甲状腺乳头状癌(PTC)是最常见的恶性肿瘤。然而,目前的诊断方法常常使患者承受不必要的手术负担。在这项研究中,我们开发并验证了一种使用个性化生物途径结合复杂机器学习算法的自动化、高度准确的多研究衍生的 PTC 诊断模型。令人惊讶的是,该算法在区分 PTC 与非肿瘤性甲状腺样本方面表现出近乎完美的性能,在三个独立的开发队列中,整体跨研究验证的接收者操作特征曲线下面积(AUROC)为 0.999(95%置信区间 [CI]:0.995-1),Brier 得分 0.013。此外,该算法在两个大型外部盲 PTC 队列中表现出出色的泛化能力和可转移性,这两个队列包括迄今为止最大的基因组 PTC 队列 The Cancer Genome Atlas(TCGA)和切尔诺贝利后队列,其中包括暴露于切尔诺贝利事故辐射后的 PTC。当应用于 TCGA 队列时,该模型的 AUROC 为 0.969(95%CI:0.950-0.987),Brier 得分为 0.109。在后切尔诺贝利队列中,它的 AUROC 为 0.962(95%CI:0.918-1),Brier 得分为 0.073。该算法还能抵抗其他各种临床情况,区分恶性和良性病变以及预后不良的侵袭性甲状腺癌与惰性甲状腺癌。此外,我们发现了 PTC 的新的途径改变和预后特征,可为后续研究提供方向。