Liu S Y, Li Z Q, Dou L Z, Zhang Y M, Liu Y, Liu Y M, Ke Y, Liu X D, Wu H R, Chu J T, He S, Wang G Q
Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China.
Department of General Surgery & Obesity and Metabolic Disease Center, China-Japan Friendship Hospital, Beijing 100029, China.
Zhonghua Zhong Liu Za Zhi. 2024 Jun 23;46(6):549-565. doi: 10.3760/cma.j.cn112152-20231207-00353.
To develop and validate predictive models for esophageal squamous cell carcinoma (ESCC) using circulating cell-free DNA (cfDNA) terminal motif analysis. The goal was to improve the non-invasive detection of early-stage ESCC and its precancerous lesions. Between August 2021 and November 2022, we prospectively collected plasma samples from 448 individuals at the Department of Endoscopy, Cancer Hospital, Chinese Academy of Medical Sciences for cfDNA extraction, library construction, and sequencing. We analyzed 201 cases of ESCC, 46 high-grade intraepithelial neoplasia (HGIN), 46 low-grade intraepithelial neoplasia (LGIN), 176 benign esophageal lesions, and 29 healthy controls. Participants, including ESCC patients and control subjects, were randomly assigned to a training set (=284) and a validation set (=122). The training cohort underwent z-score normalization of cfDNA terminal motif matrices and a selection of distinctive features differentiated ESCC cases from controls. The random forest classifier, Motif-1 (M1), was then developed through principal component analysis, ten-fold cross-validation, and recursive feature elimination. M1's efficacy was then validated in the validation and precancerous lesion sets. Subsequently, individuals with precancerous lesions were included in the dataset and participants were randomly allocated to newly formed training (=243), validation (=105), and test (=150) cohorts. Using the same procedure as M1, we trained the Motif-2 (M2) random forest model with the training cohort. The M2 model's accuracy was then confirmed in the validation cohort to establish the optimal threshold and further tested by performing validation in the test cohort. We developed two cfDNA terminal motif-based predictive models for ESCC and associated precancerous conditions. The first model, M1, achieved a sensitivity of 90.0%, a specificity of 77.4%, and an area under the curve (AUC) of 0.884 in the validation cohort. For LGIN, HGIN, and T1aN0 stage ESCC, M1's sensitivities were 76.1%, 80.4%, and 91.2% respectively. Notably, the sensitivity for jointly predicting HGIN and T1aN0 ESCC reached 85.0%. Both the predictive accuracy and sensitivity increased in line with the cancer's progression (<0.001). The second model, M2, exhibited a sensitivity of 87.5%, a specificity of 77.4%, and an AUC of 0.857 in the test cohort. M2's sensitivities for detecting precancerous lesions and ESCC were 80.0% and 89.7%, respectively, and it showed a combined sensitivity of 89.4% for HGIN and T1aN0 stage ESCC. Two predictive models based on cfDNA terminal motif analysis for ESCC and its precancerous lesions are developed. They both show high sensitivity and specificity in identifying ESCC and its precancerous stages, indicating its potential for early ESCC detection.
利用循环游离DNA(cfDNA)末端基序分析开发并验证食管鳞状细胞癌(ESCC)的预测模型。目标是改善早期ESCC及其癌前病变的非侵入性检测。在2021年8月至2022年11月期间,我们在中国医学科学院肿瘤医院内镜科前瞻性收集了448名个体的血浆样本,用于cfDNA提取、文库构建和测序。我们分析了201例ESCC、46例高级别上皮内瘤变(HGIN)、46例低级别上皮内瘤变(LGIN)、176例良性食管病变和29例健康对照。参与者,包括ESCC患者和对照受试者,被随机分配到训练集(=284)和验证集(=122)。训练队列对cfDNA末端基序矩阵进行z分数标准化,并选择将ESCC病例与对照区分开来的独特特征。然后通过主成分分析、十折交叉验证和递归特征消除开发随机森林分类器Motif-1(M1)。然后在验证集和癌前病变集中验证M1的有效性。随后,将癌前病变个体纳入数据集中,并将参与者随机分配到新形成的训练组(=243)、验证组(=105)和测试组(=150)。使用与M1相同的程序,我们用训练队列训练Motif-2(M2)随机森林模型。然后在验证队列中确认M2模型的准确性以确定最佳阈值,并在测试队列中进行验证进一步测试。我们开发了两个基于cfDNA末端基序的ESCC及其相关癌前状况的预测模型。第一个模型M1在验证队列中实现了90.0%的灵敏度、77.4%的特异性和0.884的曲线下面积(AUC)。对于LGIN、HGIN和T1aN0期ESCC,M1的灵敏度分别为76.1%、80.4%和91.2%。值得注意的是,联合预测HGIN和T1aN0 ESCC的灵敏度达到85.0%。预测准确性和灵敏度均随癌症进展而增加(<0.001)。第二个模型M2在测试队列中表现出87.5%的灵敏度、77.4%的特异性和0.857的AUC。M2检测癌前病变和ESCC的灵敏度分别为80.0%和89.7%,对于HGIN和T1aN0期ESCC,其联合灵敏度为89.4%。开发了两个基于cfDNA末端基序分析的ESCC及其癌前病变的预测模型。它们在识别ESCC及其癌前阶段方面均显示出高灵敏度和特异性,表明其在早期ESCC检测中的潜力。