Yang Shuai, Guo Qingfeng, Xing Yaqing, Liu Erjun, Zhao Fugang, Zhang Weiling
Department of Traditional Chinese Medicine, The First Hospital of Hebei Medical University Shijiazhuang 050091, Hebei, China.
Am J Transl Res. 2024 Dec 15;16(12):7438-7447. doi: 10.62347/TWTG6803. eCollection 2024.
To develop predictive models for assessing deep vein thrombosis (DVT) risk among lumbar disc herniation (LDH) patients and evaluate their performances.
A retrospective study was conducted on 798 LDH patients treated at the First Hospital of Hebei Medical University from January 2017 to December 2023. The patients were divided into a training set (n = 558) and a test set (n = 240) using computer-generated random numbers in a ratio of 7:3. Patients without DVT in the training set were categorized as the non-DVT group (n = 463), while those diagnosed with DVT were the DVT group (n = 95). Univariate analysis was performed to compare clinical data between the two groups. Data with statistical significance were used for the development of a Logistic regression model, Gradient boosting model, and Random Forest model. Model performance was evaluated through receiver operating characteristic (ROC) curve analysis and calibration curve assessment.
In the training set, univariate analysis revealed significant differences in age, platelets (PLT), cholesterol (TC), triglycerides (TG), glycated hemoglobin (HbAlc), D-dimer (D-D), fibrinogen (FIB), activated partial thromboplastin time (APTT), prothrombin time (PT), and thrombin time (TT) between the non-DVT group and the DVT group (all <0.05). Predictive models were constructed based on these indicators. The areas under the ROC curves (AUCs) in the training set were as follows (in descending order): Random Forest model (0.978) > Gradient boosting model (0.943) > Logistic regression model (0.919). In the test set, the AUCs were: Random Forest model (0.952) > Gradient boosting model (0.941) > Logistic regression model (0.908). The DeLong test indicated that the AUC of the Random Forest model in the training set was significantly higher than that of the Logistic regression model (<0.05); however, no significant difference was observed between the other two models. Calibration curves demonstrated that the predictive probabilities from all three models closely aligned with actual DVT incidence in both sets.
The Logistic regression model, Gradient boosting model, and Random Forest model constructed in this study exhibit good predictive value for the occurrence of DVT in LDH patients, aiding in the optimization of clinical management of clinical management. Among them, the Random Forest model performed the best of the three.
建立预测模型以评估腰椎间盘突出症(LDH)患者深静脉血栓形成(DVT)的风险,并评估其性能。
对2017年1月至2023年12月在河北医科大学第一医院接受治疗的798例LDH患者进行回顾性研究。使用计算机生成的随机数按7:3的比例将患者分为训练集(n = 558)和测试集(n = 240)。训练集中无DVT的患者被归类为非DVT组(n = 463),而被诊断为DVT的患者为DVT组(n = 95)。进行单因素分析以比较两组之间的临床数据。具有统计学意义的数据用于建立Logistic回归模型、梯度提升模型和随机森林模型。通过受试者工作特征(ROC)曲线分析和校准曲线评估来评估模型性能。
在训练集中,单因素分析显示非DVT组和DVT组在年龄、血小板(PLT)、胆固醇(TC)、甘油三酯(TG)、糖化血红蛋白(HbAlc)、D - 二聚体(D - D)、纤维蛋白原(FIB)、活化部分凝血活酶时间(APTT)、凝血酶原时间(PT)和凝血酶时间(TT)方面存在显著差异(均<0.05)。基于这些指标构建了预测模型。训练集中ROC曲线下面积(AUC)如下(降序排列):随机森林模型(0.978)>梯度提升模型(0.943)>Logistic回归模型(0.919)。在测试集中,AUC分别为:随机森林模型(0.952)>梯度提升模型(0.941)>Logistic回归模型(0.908)。DeLong检验表明训练集中随机森林模型的AUC显著高于Logistic回归模型(<0.05);然而,其他两个模型之间未观察到显著差异。校准曲线表明,所有三个模型的预测概率与两组中实际DVT发生率密切相关。
本研究构建的Logistic回归模型、梯度提升模型和随机森林模型对LDH患者DVT的发生具有良好的预测价值,有助于优化临床管理。其中,随机森林模型在三者中表现最佳。