Division of Medical & Surgical Nursing, School of Nursing, Peking University, Beijing, China.
Department of Biostatistics, School of Public Health, Peking University, Beijing, China.
Int J Med Inform. 2022 May;161:104733. doi: 10.1016/j.ijmedinf.2022.104733. Epub 2022 Mar 5.
To develop and validate machine learning (ML) models for cancer-associated deep vein thrombosis (DVT) and to compare the performance of these models with the Khorana score (KS).
We randomly extracted data of 2100 patients with cancer between Jan. 1, 2017, and Oct. 31, 2019, and 1035 patients who underwent Doppler ultrasonography were enrolled. Univariate analysis and Lasso regression were applied to select important predictors. Model training and hyperparameter tuning were implemented on 70% of the data using a ten-fold cross-validation method. The remaining 30% of the data were used to compare the performance with seven indicators (area under the receiver operating characteristic curve [AUC], sensitivity, specificity, accuracy, balanced accuracy, Brier score, and calibration curve), among all five ML models (linear discriminant analysis [LDA], logistic regression [LR], classification tree [CT], random forest [RF], and support vector machine [SVM]), and the KS.
The incidence of cancer-associated DVT was 22.3%. The top five predictors were D-dimer level, age, Charlson Comorbidity Index (CCI), length of stay (LOS), and previous VTE (venous thromboembolism) history according to RF. Only LDA (AUC = 0.773) and LR (AUC = 0.772) outperformed KS (AUC = 0.642), and combination with D-dimer showed improved performance in all models. A nomogram and web calculator https://webcalculatorofcancerassociateddvt.shinyapps.io/dynnomapp/ were used to visualize the best recommended LR model.
This study developed and validated cancer-associated DVT predictive models using five ML algorithms and visualized the best recommended model using a nomogram and web calculator. The nomogram and web calculator developed in this study may assist doctors and nurses in evaluating individualized cancer-associated DVT risk and making decisions. However, other prospective cohort studies should be conducted to externally validate the recommended model.
开发和验证用于癌症相关深静脉血栓形成(DVT)的机器学习(ML)模型,并将这些模型的性能与 Khorana 评分(KS)进行比较。
我们随机提取了 2017 年 1 月 1 日至 2019 年 10 月 31 日期间的 2100 例癌症患者的数据,其中 1035 例患者接受了多普勒超声检查。采用单因素分析和 Lasso 回归方法筛选重要预测因子。使用十折交叉验证方法对 70%的数据进行模型训练和超参数调优。使用其余 30%的数据,比较了所有五种 ML 模型(线性判别分析[LDA]、逻辑回归[LR]、分类树[CT]、随机森林[RF]和支持向量机[SVM])和 KS 之间的七种指标(接受者操作特征曲线下的面积[AUC]、敏感性、特异性、准确性、平衡准确性、Brier 评分和校准曲线)的性能。
癌症相关 DVT 的发生率为 22.3%。根据 RF,前五个预测因子为 D-二聚体水平、年龄、Charlson 合并症指数(CCI)、住院时间(LOS)和既往 VTE(静脉血栓栓塞)史。只有 LDA(AUC=0.773)和 LR(AUC=0.772)优于 KS(AUC=0.642),且与 D-二聚体联合使用可提高所有模型的性能。使用列线图和网络计算器 https://webcalculatorofcancerassociateddvt.shinyapps.io/dynnomapp/ 可视化了最佳推荐的 LR 模型。
本研究使用五种 ML 算法开发和验证了癌症相关 DVT 预测模型,并使用列线图和网络计算器可视化了最佳推荐模型。本研究开发的列线图和网络计算器可帮助医生和护士评估个体癌症相关 DVT 风险并做出决策。然而,应进行其他前瞻性队列研究以对外验证推荐模型。