Suppr超能文献

基于可解释机器学习预测结核病患者药物性肝损伤:模型开发与验证研究。

Interpretable machine learning in predicting drug-induced liver injury among tuberculosis patients: model development and validation study.

机构信息

School of International Pharmaceutical Business, China Pharmaceutical University, Nanjing, Jiangsu, China.

Institute of Tuberculosis Prevention and Control, Ningbo Municipal Center for Disease Control and Prevention, No.237, Yongfeng Road, Ningbo, Zhejiang, China.

出版信息

BMC Med Res Methodol. 2024 Apr 20;24(1):92. doi: 10.1186/s12874-024-02214-5.

Abstract

BACKGROUND

The objective of this research was to create and validate an interpretable prediction model for drug-induced liver injury (DILI) during tuberculosis (TB) treatment.

METHODS

A dataset of TB patients from Ningbo City was used to develop models employing the eXtreme Gradient Boosting (XGBoost), random forest (RF), and the least absolute shrinkage and selection operator (LASSO) logistic algorithms. The model's performance was evaluated through various metrics, including the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPR) alongside the decision curve. The Shapley Additive exPlanations (SHAP) method was used to interpret the variable contributions of the superior model.

RESULTS

A total of 7,071 TB patients were identified from the regional healthcare dataset. The study cohort consisted of individuals with a median age of 47 years, 68.0% of whom were male, and 16.3% developed DILI. We utilized part of the high dimensional propensity score (HDPS) method to identify relevant variables and obtained a total of 424 variables. From these, 37 variables were selected for inclusion in a logistic model using LASSO. The dataset was then split into training and validation sets according to a 7:3 ratio. In the validation dataset, the XGBoost model displayed improved overall performance, with an AUROC of 0.89, an AUPR of 0.75, an F1 score of 0.57, and a Brier score of 0.07. Both SHAP analysis and XGBoost model highlighted the contribution of baseline liver-related ailments such as DILI, drug-induced hepatitis (DIH), and fatty liver disease (FLD). Age, alanine transaminase (ALT), and total bilirubin (Tbil) were also linked to DILI status.

CONCLUSION

XGBoost demonstrates improved predictive performance compared to RF and LASSO logistic in this study. Moreover, the introduction of the SHAP method enhances the clinical understanding and potential application of the model. For further research, external validation and more detailed feature integration are necessary.

摘要

背景

本研究旨在创建和验证一个可解释的预测模型,用于预测结核病(TB)治疗期间的药物性肝损伤(DILI)。

方法

利用宁波市 TB 患者数据集,采用极端梯度提升(XGBoost)、随机森林(RF)和最小绝对收缩和选择算子(LASSO)逻辑算法开发模型。通过各种指标评估模型性能,包括接收者操作特征曲线下的面积(AUROC)和精度召回曲线下的面积(AUPR)以及决策曲线。使用 Shapley Additive exPlanations(SHAP)方法解释优势模型的变量贡献。

结果

从区域医疗数据集共确定了 7071 例 TB 患者。研究队列由中位年龄为 47 岁的个体组成,其中 68.0%为男性,16.3%发生 DILI。我们使用部分高维倾向评分(HDPS)方法来识别相关变量,共获得 424 个变量。从中,使用 LASSO 选择 37 个变量纳入逻辑模型。然后根据 7:3 的比例将数据集分为训练集和验证集。在验证数据集中,XGBoost 模型显示出了更好的整体性能,AUROC 为 0.89,AUPR 为 0.75,F1 评分为 0.57,Brier 评分为 0.07。SHAP 分析和 XGBoost 模型均强调了基线肝相关疾病(如 DILI、药物性肝炎(DIH)和脂肪肝疾病(FLD))对 DILI 状态的贡献。年龄、丙氨酸转氨酶(ALT)和总胆红素(Tbil)也与 DILI 状态相关。

结论

与 RF 和 LASSO 逻辑相比,XGBoost 在本研究中显示出了更好的预测性能。此外,引入 SHAP 方法增强了模型的临床理解和潜在应用。为了进一步研究,需要进行外部验证和更详细的特征整合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e33/11031978/6d5f75adc9ce/12874_2024_2214_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验