Hueman Matthew, Wang Huan, Henson Donald, Chen Dechang
Department of Surgical Oncology, John P. Murtha Cancer Center, Walter Reed National Military Medical Center, Bethesda, Maryland, USA.
Biostatistics, George Washington University, Washington, District of Columbia, USA.
ESMO Open. 2019 Jun 12;4(3):e000518. doi: 10.1136/esmoopen-2019-000518. eCollection 2019.
The American Joint Committee on Cancer (AJCC) system for staging cancers of the colon and rectum includes depth of tumour penetration, number of positive lymph nodes and presence or absence of metastasis. Using machine learning, we demonstrate that these factors can be integrated with age, carcinoembryonic antigen (CEA) interpretation and tumour location, to form prognostic systems that expand the tumour, lymph node, metastasis (TNM) staging system.
Two datasets on colon and rectal cancers were extracted from the Surveillance, Epidemiology and End Results Programme of the National Cancer Institute. Dataset 1 included three factors (tumour, lymph nodes and metastasis). Dataset 2 contained six factors (tumour, lymph nodes, metastasis, age, CEA interpretation and tumour location). The Ensemble Algorithm for Clustering Cancer Data (EACCD) and the C-index were applied to generate prognostic groups.
The EACCD prognostic system based on dataset 1 stratified patients into 10 risk groups, analogous to the 10 stages of the AJCC staging system. There was a strong inter-system association between EACCD grouping and AJCC staging (Spearman's rank correlation=0.9046, p value=1.6×10). However, the EACCD system had a significantly higher survival prediction accuracy than the AJCC system (C-index=0.7802 and 0.7695, respectively for the EACCD system and AJCC system, p value=4.9×10). Adding age, or CEA interpretation, or location improved the prediction accuracy of the prognostic system-involving tumour, lymph nodes and metastasis. The EACCD prognostic system based on dataset 2 and all six factors stratified patients into 10 groups with the highest survival prediction accuracy (C-index=0.7914).
The EACCD can integrate multiple factors to stratify patients with colon or rectal cancer into risk groups that predict survival with a high accuracy.
美国癌症联合委员会(AJCC)的结直肠癌分期系统包括肿瘤浸润深度、阳性淋巴结数量以及有无转移。我们通过机器学习证明,这些因素可与年龄、癌胚抗原(CEA)解读及肿瘤位置相结合,形成扩展肿瘤、淋巴结、转移(TNM)分期系统的预后系统。
从美国国立癌症研究所的监测、流行病学和最终结果计划中提取了两个关于结肠癌和直肠癌的数据集。数据集1包含三个因素(肿瘤、淋巴结和转移)。数据集2包含六个因素(肿瘤、淋巴结、转移、年龄、CEA解读和肿瘤位置)。应用癌症数据聚类集成算法(EACCD)和C指数生成预后分组。
基于数据集1的EACCD预后系统将患者分为10个风险组,类似于AJCC分期系统的10个阶段。EACCD分组与AJCC分期之间存在很强的系统间关联(斯皮尔曼等级相关性=0.9046,p值=1.6×10)。然而,EACCD系统的生存预测准确性明显高于AJCC系统(EACCD系统和AJCC系统的C指数分别为0.7802和0.7695,p值=4.9×10)。加入年龄、或CEA解读、或位置可提高涉及肿瘤、淋巴结和转移的预后系统的预测准确性。基于数据集2及所有六个因素的EACCD预后系统将患者分为10组,生存预测准确性最高(C指数=0.7914)。
EACCD可整合多种因素,将结肠癌或直肠癌患者分层为能高精度预测生存的风险组。