Suppr超能文献

基于机器学习的他汀类药物疗效和安全性预测模型。

Machine learning-based prediction model for the efficacy and safety of statins.

作者信息

Xiong Yu, Liu Xiaoyang, Wang Qing, Zhao Li, Kong Xudong, Da Chunhe, Meng Zuohuan, Qu Leilei, Xia Qinfang, Liu Lihong, Li Pengmei

机构信息

Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Department of Pharmacy, China-Japan Friendship Hospital, Beijing, China.

出版信息

Front Pharmacol. 2024 Jul 29;15:1334929. doi: 10.3389/fphar.2024.1334929. eCollection 2024.

Abstract

OBJECTIVE

The appropriate use of statins plays a vital role in reducing the risk of atherosclerotic cardiovascular disease (ASCVD). However, due to changes in diet and lifestyle, there has been a significant increase in the number of individuals with high cholesterol levels. Therefore, it is crucial to ensure the rational use of statins. Adverse reactions associated with statins, including liver enzyme abnormalities and statin-associated muscle symptoms (SAMS), have impacted their widespread utilization. In this study, we aimed to develop a predictive model for statin efficacy and safety based on real-world clinical data using machine learning techniques.

METHODS

We employed various data preprocessing techniques, such as improved random forest imputation and Borderline SMOTE oversampling, to handle the dataset. Boruta method was utilized for feature selection, and the dataset was divided into training and testing sets in a 7:3 ratio. Five algorithms, including logistic regression, naive Bayes, decision tree, random forest, and gradient boosting decision tree, were used to construct the predictive models. Ten-fold cross-validation and bootstrapping sampling were performed for internal and external validation. Additionally, SHAP (SHapley Additive exPlanations) was employed for feature interpretability. Ultimately, an accessible web-based platform for predicting statin efficacy and safety was established based on the optimal predictive model.

RESULTS

The random forest algorithm exhibited the best performance among the five algorithms. The predictive models for LDL-C target attainment (AUC = 0.883, Accuracy = 0.868, Precision = 0.858, Recall = 0.863, F1 = 0.860, AUPRC = 0.906, MCC = 0.761), liver enzyme abnormalities (AUC = 0.964, Accuracy = 0.964, Precision = 0.967, Recall = 0.963, F1 = 0.965, AUPRC = 0.978, MCC = 0.938), and muscle pain/Creatine kinase (CK) abnormalities (AUC = 0.981, Accuracy = 0.980, Precision = 0.987, Recall = 0.975, F1 = 0.981, AUPRC = 0.987, MCC = 0.965) demonstrated favorable performance. The most important features of LDL-C target attainment prediction model was cerebral infarction, TG, PLT and HDL. The most important features of liver enzyme abnormalities model was CRP, CK and number of oral medications. Similarly, AST, ALT, PLT and number of oral medications were found to be important features for muscle pain/CK abnormalities. Based on the best-performing predictive model, a user-friendly web application was designed and implemented.

CONCLUSION

This study presented a machine learning-based predictive model for statin efficacy and safety. The platform developed can assist in guiding statin therapy decisions and optimizing treatment strategies. Further research and application of the model are warranted to improve the utilization of statin therapy.

摘要

目的

合理使用他汀类药物在降低动脉粥样硬化性心血管疾病(ASCVD)风险方面起着至关重要的作用。然而,由于饮食和生活方式的改变,高胆固醇水平个体的数量显著增加。因此,确保他汀类药物的合理使用至关重要。与他汀类药物相关的不良反应,包括肝酶异常和他汀类药物相关肌肉症状(SAMS),影响了它们的广泛应用。在本研究中,我们旨在利用机器学习技术基于真实世界临床数据开发一种他汀类药物疗效和安全性的预测模型。

方法

我们采用了各种数据预处理技术,如改进的随机森林插补和边界合成少数类过采样(Borderline SMOTE)来处理数据集。使用Boruta方法进行特征选择,并将数据集按7:3的比例分为训练集和测试集。使用逻辑回归、朴素贝叶斯、决策树、随机森林和梯度提升决策树这五种算法构建预测模型。进行十折交叉验证和自助抽样以进行内部和外部验证。此外,使用SHAP(SHapley值相加解释)进行特征解释。最终,基于最优预测模型建立了一个可访问的基于网络的他汀类药物疗效和安全性预测平台。

结果

随机森林算法在五种算法中表现最佳。低密度脂蛋白胆固醇(LDL-C)达标预测模型(AUC = 0.883,准确率 = 0.868,精确率 = 0.858,召回率 = 0.863,F1 = 0.860,AUPRC = 0.906,MCC = 0.761)、肝酶异常预测模型(AUC = 0.964,准确率 = 0.964,精确率 = 0.967,召回率 = 0.963,F1 = 0.965,AUPRC = 0.978,MCC = 0.938)和肌肉疼痛/肌酸激酶(CK)异常预测模型(AUC = 0.981,准确率 = 0.980,精确率 = 0.987,召回率 = 0.975,F1 = 0.981,AUPRC = 0.987,MCC = 0.965)表现良好。LDL-C达标预测模型的最重要特征是脑梗死、甘油三酯(TG)、血小板(PLT)和高密度脂蛋白(HDL)。肝酶异常模型的最重要特征是C反应蛋白(CRP)、CK和口服药物数量。同样,天门冬氨酸氨基转移酶(AST)、丙氨酸氨基转移酶(ALT)、PLT和口服药物数量被发现是肌肉疼痛/CK异常的重要特征。基于性能最佳的预测模型,设计并实现了一个用户友好的网络应用程序。

结论

本研究提出了一种基于机器学习的他汀类药物疗效和安全性预测模型。所开发的平台可协助指导他汀类药物治疗决策并优化治疗策略。该模型的进一步研究和应用对于提高他汀类药物治疗的利用率是必要的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c6b/11317424/edaec1ca9b16/fphar-15-1334929-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验