Department of Radiation Oncology.
Department of Biomedical Data Science.
J Natl Cancer Inst. 2019 Jun 1;111(6):568-574. doi: 10.1093/jnci/djy178.
Oncologists use patients' life expectancy to guide decisions and may benefit from a tool that accurately predicts prognosis. Existing prognostic models generally use only a few predictor variables. We used an electronic medical record dataset to train a prognostic model for patients with metastatic cancer.
The model was trained and tested using 12 588 patients treated for metastatic cancer in the Stanford Health Care system from 2008 to 2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into a training set used to fit the model coefficients and a test set used to evaluate model performance (80%/20% split). A regularized Cox model with 4126 predictor variables was used. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients.
The C-index for overall survival was 0.786 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.745 (95% confidence interval [CI] = 0.715 to 0.775) compared with 0.635 (95% CI = 0.601 to 0.669) for a published model using performance status, primary tumor site, and treated site (two-sided P < .001). Our model's predictions were well-calibrated.
The model showed high predictive performance, which will need to be validated using external data. Because it is fully automated, the model can be used to examine providers' practice patterns and could be deployed in a decision support tool to help improve quality of care.
肿瘤学家使用患者的预期寿命来指导决策,他们可能会受益于一种能够准确预测预后的工具。现有的预后模型通常仅使用少数预测变量。我们使用电子病历数据集为转移性癌症患者训练了一种预后模型。
该模型使用斯坦福医疗保健系统在 2008 年至 2017 年间治疗的 12588 名转移性癌症患者进行了训练和测试。数据来源包括提供者记录的文本、实验室检查、生命体征、程序、药物医嘱和诊断代码。患者被随机分为训练集,用于拟合模型系数,以及测试集,用于评估模型性能(80%/20%的分割)。使用带有 4126 个预测变量的正则化 Cox 模型。由于每个患者有多个观测值,因此使用了一个标记方法,其中 t0 设置为转移性癌症诊断的时间。还使用测试集中的 399 例姑息性放疗课程评估了性能。
在测试集中,总生存的 C 指数为 0.786(跨越标记时间平均)。对于姑息性放疗课程,C 指数为 0.745(95%置信区间[CI] = 0.715 至 0.775),而使用表现状态、原发肿瘤部位和治疗部位的发表模型为 0.635(95%CI = 0.601 至 0.669)(双侧 P < 0.001)。我们的模型预测得到了很好的校准。
该模型表现出较高的预测性能,这需要使用外部数据进行验证。由于它是全自动的,因此该模型可用于检查提供者的实践模式,并可部署在决策支持工具中,以帮助提高护理质量。