利用自然语言处理技术从癌症患者的初始肿瘤学咨询文档中预测其生存情况。
Predicting the Survival of Patients With Cancer From Their Initial Oncology Consultation Document Using Natural Language Processing.
机构信息
BC Cancer, Vancouver, British Columbia, Canada.
Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada.
出版信息
JAMA Netw Open. 2023 Feb 1;6(2):e230813. doi: 10.1001/jamanetworkopen.2023.0813.
IMPORTANCE
Predicting short- and long-term survival of patients with cancer may improve their care. Prior predictive models either use data with limited availability or predict the outcome of only 1 type of cancer.
OBJECTIVE
To investigate whether natural language processing can predict survival of patients with general cancer from a patient's initial oncologist consultation document.
DESIGN, SETTING, AND PARTICIPANTS: This retrospective prognostic study used data from 47 625 of 59 800 patients who started cancer care at any of the 6 BC Cancer sites located in the province of British Columbia between April 1, 2011, and December 31, 2016. Mortality data were updated until April 6, 2022, and data were analyzed from update until September 30, 2022. All patients with a medical or radiation oncologist consultation document generated within 180 days of diagnosis were included; patients seen for multiple cancers were excluded.
EXPOSURES
Initial oncologist consultation documents were analyzed using traditional and neural language models.
MAIN OUTCOMES AND MEASURES
The primary outcome was the performance of the predictive models, including balanced accuracy and receiver operating characteristics area under the curve (AUC). The secondary outcome was investigating what words the models used.
RESULTS
Of the 47 625 patients in the sample, 25 428 (53.4%) were female and 22 197 (46.6%) were male, with a mean (SD) age of 64.9 (13.7) years. A total of 41 447 patients (87.0%) survived 6 months, 31 143 (65.4%) survived 36 months, and 27 880 (58.5%) survived 60 months, calculated from their initial oncologist consultation. The best models achieved a balanced accuracy of 0.856 (AUC, 0.928) for predicting 6-month survival, 0.842 (AUC, 0.918) for 36-month survival, and 0.837 (AUC, 0.918) for 60-month survival, on a holdout test set. Differences in what words were important for predicting 6- vs 60-month survival were found.
CONCLUSIONS AND RELEVANCE
These findings suggest that models performed comparably with or better than previous models predicting cancer survival and that they may be able to predict survival using readily available data without focusing on 1 cancer type.
重要性
预测癌症患者的短期和长期生存情况可能会改善他们的治疗效果。先前的预测模型要么使用可用性有限的数据,要么只能预测 1 种癌症的结果。
目的
研究自然语言处理是否可以从患者的初始肿瘤医生咨询文件中预测一般癌症患者的生存情况。
设计、地点和参与者:这是一项回顾性预后研究,使用了 2011 年 4 月 1 日至 2016 年 12 月 31 日期间在不列颠哥伦比亚省的 6 个 BC 癌症中心之一开始癌症治疗的 59800 名患者中的 47625 名患者的数据。死亡率数据更新至 2022 年 4 月 6 日,分析数据截至 2022 年 9 月 30 日。所有患者都包括在诊断后 180 天内有医疗或放射肿瘤医生咨询文件的患者;排除了接受多种癌症治疗的患者。
暴露情况
使用传统和神经语言模型分析初始肿瘤医生咨询文件。
主要结果和措施
主要结果是预测模型的性能,包括平衡准确性和接收者操作特征曲线下面积(AUC)。次要结果是调查模型使用了哪些词。
结果
在样本中的 47625 名患者中,25428 名(53.4%)为女性,22197 名(46.6%)为男性,平均年龄(标准差)为 64.9(13.7)岁。从他们的初始肿瘤医生咨询中,共有 41447 名患者(87.0%)在 6 个月时存活,31143 名(65.4%)在 36 个月时存活,27880 名(58.5%)在 60 个月时存活。最佳模型在验证集上预测 6 个月生存率的平衡准确率为 0.856(AUC,0.928),预测 36 个月生存率的平衡准确率为 0.842(AUC,0.918),预测 60 个月生存率的平衡准确率为 0.837(AUC,0.918)。还发现了预测 6 个月和 60 个月生存率的重要词汇之间的差异。
结论和相关性
这些发现表明,模型的表现与之前预测癌症生存的模型相当或更好,并且它们可能能够使用现成的数据进行预测,而无需专注于 1 种癌症。