Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY.
Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY.
JCO Clin Cancer Inform. 2022 Sep;6:e2200014. doi: 10.1200/CCI.22.00014.
Natural language processing (NLP) applied to radiology reports can help identify clinically relevant M1 subcategories of patients with colorectal cancer (CRC). The primary purpose was to compare the overall survival (OS) of CRC according to American Joint Committee on Cancer TNM staging and explore an alternative classification. The secondary objective was to estimate the frequency of metastasis for each organ.
Retrospective study of CRC who underwent computed tomography (CT) chest, abdomen, and pelvis between July 1, 2009, and March 26, 2019, at a tertiary cancer center, previously labeled for the presence or absence of metastasis by an NLP prediction model. Patients were classified in M0, M1a, M1b, and M1c (American Joint Committee on Cancer), or an alternative classification on the basis of the metastasis organ number: M1, single; M2, two; M3, three or more organs. Cox regression models were used to estimate hazard ratios; Kaplan-Meier curves were used to visualize survival curves using the two M1 subclassifications.
Nine thousand nine hundred twenty-eight patients with a total of 48,408 CT chest, abdomen, and pelvis reports were included. On the basis of NLP prediction, the median OS of M1a, M1b, and M1c was 4.47, 1.72, and 1.52 years, respectively. The median OS of M1, M2, and M3 was 4.24, 2.05, and 1.04 years, respectively. Metastases occurred most often in liver (35.8%), abdominopelvic lymph nodes (32.9%), lungs (29.3%), peritoneum (22.0%), thoracic nodes (19.9%), bones (9.2%), and pelvic organs (7.5%). Spleen and adrenal metastases occurred in < 5%.
NLP applied to a large radiology report database can identify clinically relevant metastatic phenotypes and be used to investigate new M1 substaging for CRC. Patients with three or more metastatic disease organs have the worst prognosis, with an OS of 1 year.
自然语言处理(NLP)应用于放射学报告可以帮助识别结直肠癌(CRC)患者的临床相关 M1 亚类。主要目的是根据美国癌症联合委员会 TNM 分期比较 CRC 的总生存(OS),并探索替代分类。次要目的是估计每个器官转移的频率。
对 2009 年 7 月 1 日至 2019 年 3 月 26 日在一家三级癌症中心接受胸部、腹部和骨盆 CT 的 CRC 患者进行回顾性研究,这些患者的存在或不存在转移情况先前由 NLP 预测模型标记。患者根据转移器官数量分为 M0、M1a、M1b 和 M1c(美国癌症联合委员会)或基于转移器官数量的替代分类:M1,单个;M2,两个;M3,三个或更多器官。使用 Cox 回归模型估计风险比;使用两种 M1 亚分类的 Kaplan-Meier 曲线可视化生存曲线。
共纳入 9928 例患者,共 48408 例胸部、腹部和骨盆 CT 报告。根据 NLP 预测,M1a、M1b 和 M1c 的中位 OS 分别为 4.47、1.72 和 1.52 年。M1、M2 和 M3 的中位 OS 分别为 4.24、2.05 和 1.04 年。转移最常发生于肝脏(35.8%)、腹盆腔淋巴结(32.9%)、肺部(29.3%)、腹膜(22.0%)、胸内淋巴结(19.9%)、骨骼(9.2%)和盆腔器官(7.5%)。脾和肾上腺转移<5%。
将 NLP 应用于大型放射学报告数据库可以识别具有临床意义的转移性表型,并用于研究 CRC 的新 M1 亚分期。有三个或更多转移性疾病器官的患者预后最差,OS 为 1 年。