Cao Yuepeng, Wang Nannan, Wu Xuxiaochen, Tang Wanxiangfu, Bao Hua, Si Chengshuai, Shao Peng, Li Dongzheng, Zhou Xin, Zhu Dongqin, Yang Shanshan, Wang Fufeng, Su Guoqing, Wang Ke, Wang Qifan, Zhang Yao, Wang Qiangcheng, Yu Dongsheng, Jiang Qian, Bao Jun, Yang Liu
Colorectal Center, The Affiliated Cancer Hospital of Nanjing Medical University, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, Nanjing, China.
Geneseeq Research Institute, Nanjing Geneseeq Technology Inc., Nanjing, China.
Cancer Res. 2024 Oct 1;84(19):3286-3295. doi: 10.1158/0008-5472.CAN-23-3486.
Colorectal cancer is frequently diagnosed in advanced stages, highlighting the need for developing approaches for early detection. Liquid biopsy using cell-free DNA (cfDNA) fragmentomics is a promising approach, but the clinical application is hindered by complexity and cost. This study aimed to develop an integrated model using cfDNA fragmentomics for accurate, cost-effective early-stage colorectal cancer detection. Plasma cfDNA was extracted and sequenced from a training cohort of 360 participants, including 176 patients with colorectal cancer and 184 healthy controls. An ensemble stacked model comprising five machine learning models was employed to distinguish patients with colorectal cancer from healthy controls using five cfDNA fragmentomic features. The model was validated in an independent cohort of 236 participants (117 patients with colorectal cancer and 119 controls) and a prospective cohort of 242 participants (129 patients with colorectal cancer and 113 controls). The ensemble stacked model showed remarkable discriminatory power between patients with colorectal cancer and controls, outperforming all base models and achieving a high area under the receiver operating characteristic curve of 0.986 in the validation cohort. It reached 94.88% sensitivity and 98% specificity for detecting colorectal cancer in the validation cohort, with sensitivity increasing as the cancer progressed. The model also demonstrated consistently high accuracy in within-run and between-run tests and across various conditions in healthy individuals. In the prospective cohort, it achieved 91.47% sensitivity and 95.58% specificity. This integrated model capitalizes on the multiplex nature of cfDNA fragmentomics to achieve high sensitivity and robustness, offering significant promise for early colorectal cancer detection and broad patient benefit. Significance: The development of a minimally invasive, efficient approach for early colorectal cancer detection using advanced machine learning to analyze cfDNA fragment patterns could expedite diagnosis and improve treatment outcomes for patients. See related commentary by Rolfo and Russo, p. 3128.
结直肠癌常被诊断为晚期,这凸显了开发早期检测方法的必要性。使用游离DNA(cfDNA)片段组学的液体活检是一种很有前景的方法,但临床应用受到复杂性和成本的阻碍。本研究旨在开发一种使用cfDNA片段组学的综合模型,用于准确、经济高效地检测早期结直肠癌。从360名参与者的训练队列中提取血浆cfDNA并进行测序,其中包括176名结直肠癌患者和184名健康对照。使用包含五个机器学习模型的集成堆叠模型,利用五个cfDNA片段组学特征将结直肠癌患者与健康对照区分开来。该模型在一个由236名参与者组成的独立队列(117名结直肠癌患者和119名对照)和一个由242名参与者组成的前瞻性队列(129名结直肠癌患者和113名对照)中得到验证。集成堆叠模型在结直肠癌患者和对照之间显示出显著的区分能力,优于所有基础模型,在验证队列中受试者工作特征曲线下面积高达0.986。在验证队列中,其检测结直肠癌的灵敏度达到94.88%,特异性达到98%,且灵敏度随着癌症进展而增加。该模型在批内和批间测试以及健康个体的各种条件下也表现出始终如一的高准确性。在前瞻性队列中,其灵敏度为91.47%,特异性为95.58%。这种综合模型利用cfDNA片段组学的多重性质实现了高灵敏度和稳健性,为早期结直肠癌检测带来了巨大希望,并使广大患者受益。意义:利用先进的机器学习分析cfDNA片段模式开发一种微创、高效的早期结直肠癌检测方法,可以加快诊断并改善患者的治疗结果。见Rolfo和Russo的相关评论,第3128页。