Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, China.
Department of Maxillofacial and Otorhinolaryngology Oncology, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, China.
Mol Oncol. 2024 Nov;18(11):2755-2769. doi: 10.1002/1878-0261.13745. Epub 2024 Oct 8.
Early cancer diagnosis from bisulfite-treated cell-free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep-learning-based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite-treated cfDNA sequencing fragments. DECIDIA relies on transformer-based representation learning of DNA fragments and weakly supervised multiple-instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non-cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976-0.984) in 10-fold cross-validation settings on the CRC dataset by differentiating cancer patients from cancer-free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896-0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer-free controls, although there was no HCC data used in model development. In the settings of cancer-type classification, we observed that DECIDIA achieved a micro-average AUROC of 0.963 (95% CI, 0.960-0.966) and an overall accuracy of 82.8% (95% CI, 81.8-83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite-treated cfDNA methylome.
从经亚硫酸氢盐处理的无细胞游离 DNA(cfDNA)片段进行早期癌症诊断需要繁琐的数据分析过程。在这里,我们提出了一种基于深度学习的早期癌症拦截和诊断(DECIDIA)方法,该方法可以仅从经亚硫酸氢盐处理的 cfDNA 测序片段实现准确的癌症诊断。DECIDIA 依赖于基于转换器的 DNA 片段表示学习和弱监督多实例学习进行分类。我们系统地评估了 DECIDIA 在一个经过精心编辑的 5389 个样本数据集上的癌症诊断和癌症类型预测性能,该数据集包括结直肠癌(CRC;n=1574)、肝细胞癌(HCC;n=1181)、肺癌(n=654)和非癌症对照(n=1980)。在 CRC 数据集的 10 倍交叉验证设置中,通过区分癌症患者和无癌症对照,DECIDIA 在区分癌症患者和无癌症对照方面的接收器操作曲线(AUROC)达到 0.980(95%置信区间,0.976-0.984),优于基于甲基化强度的基准方法。值得注意的是,尽管在模型开发中没有使用 HCC 数据,但 DECIDIA 在区分 HCC 患者和无癌症对照方面在外部独立的 HCC 测试集中的 AUROC 达到 0.910(95%置信区间,0.896-0.924)。在癌症类型分类的设置中,我们观察到 DECIDIA 实现了 0.963(95%置信区间,0.960-0.966)的微平均 AUROC 和 82.8%(95%置信区间,81.8-83.9)的整体准确性。此外,我们从原始测序读数中提取了四个序列特征,这些特征在癌症与对照以及不同癌症类型之间表现出不同的模式。我们的方法代表了一种新的范例,用于消除使用经亚硫酸氢盐处理的 cfDNA 甲基化组的液体活检中繁琐的数据分析过程。