Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, 76203, USA.
Department of Mathematics, University of North Texas, Denton, TX, 76203, USA.
BMC Bioinformatics. 2022 Jun 13;23(1):229. doi: 10.1186/s12859-022-04783-y.
Despite remarkable advances in cancer research, cancer remains one of the leading causes of death worldwide. Early detection of cancer and localization of the tissue of its origin are key to effective treatment. Here, we leverage technological advances in machine learning or artificial intelligence to design a novel framework for cancer diagnostics. Our proposed framework detects cancers and their tissues of origin using a unified model of cancers encompassing 33 cancers represented in The Cancer Genome Atlas (TCGA). Our model exploits the learned features of different cancers reflected in the respective dysregulated epigenomes, which arise early in carcinogenesis and differ remarkably between different cancer types or subtypes, thus holding a great promise in early cancer detection.
Our comprehensive assessment of the proposed model on the 33 different tissues of origin demonstrates its ability to detect and classify cancers to a high accuracy (> 99% overall F-measure). Furthermore, our model distinguishes cancers from pre-cancerous lesions to metastatic tumors and discriminates between hypomethylation changes due to age related epigenetic drift and true cancer.
Beyond detection of primary cancers, our proposed computational model also robustly detects tissues of origin of secondary cancers, including metastatic cancers, second primary cancers, and cancers of unknown primaries. Our assessment revealed the ability of this model to characterize pre-cancer samples, a significant step forward in early cancer detection. Deployed broadly this model can deliver accurate diagnosis for a greatly expanded target patient population.
尽管癌症研究取得了显著进展,但癌症仍然是全球主要死因之一。癌症的早期检测和起源组织的定位是有效治疗的关键。在这里,我们利用机器学习或人工智能技术的进步,设计了一种用于癌症诊断的新框架。我们提出的框架使用包含在癌症基因组图谱(TCGA)中的 33 种癌症的统一癌症模型来检测癌症及其起源组织。我们的模型利用不同癌症在各自失调的表观基因组中反映出的已学习到的特征,这些特征在癌变早期出现,并且在不同的癌症类型或亚型之间有很大的差异,因此在早期癌症检测方面具有很大的潜力。
我们对该模型在 33 种不同起源组织中的综合评估表明,它能够以高精度(总体 F 度量> 99%)检测和分类癌症。此外,我们的模型能够区分癌症与癌前病变、转移瘤,并区分由于年龄相关的表观遗传漂移和真正的癌症引起的低甲基化变化。
除了检测原发性癌症外,我们提出的计算模型还能够稳健地检测继发性癌症的起源组织,包括转移性癌症、第二原发性癌症和未知原发性癌症。我们的评估揭示了该模型对癌前样本进行特征描述的能力,这是早期癌症检测的重要一步。如果广泛部署,该模型可以为更大的目标患者群体提供准确的诊断。