Suppr超能文献

可解释机器学习技术预测胺碘酮诱导甲状腺功能障碍风险:多中心回顾性研究及外部验证。

Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation.

机构信息

Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.

Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan.

出版信息

J Med Internet Res. 2023 Feb 7;25:e43734. doi: 10.2196/43734.

Abstract

BACKGROUND

Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects.

OBJECTIVE

We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds.

METHODS

This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed.

RESULTS

The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features.

CONCLUSIONS

Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.

摘要

背景

机器学习为预测危及生命、不可预测的胺碘酮引起的甲状腺功能障碍提供了新的解决方案。没有考虑特征时间序列的传统回归方法进行不良效应预测,预测效果并不理想。使用不同时间点的多个数据集的机器学习算法可能会在预测不良效应方面产生更好的性能。

目的

我们旨在开发和验证用于预测个体化胺碘酮引起的甲状腺功能障碍风险的机器学习模型,并使用重采样方法和重新调整临床衍生决策阈值来优化基于机器学习的风险分层方案。

方法

本研究使用多中心、非关联的电子健康记录开发了机器学习模型。它纳入了 2013 年 1 月至 2017 年 12 月期间接受胺碘酮治疗的患者。训练集由台北医学大学附属医院和万芳医院的数据组成,而台北医学大学双和医院的数据则作为外部测试集。研究收集了基线的静态特征和胺碘酮起始后第 1、2、3、6、9、12、15、18 和 21 个月的动态特征。我们使用了 16 种机器学习模型,包括极端梯度提升、自适应提升、k-最近邻和逻辑回归模型,以及原始重采样方法和 3 种其他重采样方法,包括边界合成少数过采样技术的过采样、编辑最近邻的欠采样、以及过采样和欠采样混合方法。根据准确性、精确率、召回率、F1 评分、几何平均值、接收器操作特征曲线下的面积(AUROC)和精度-召回曲线下的面积(AUPRC)比较模型性能。通过最佳模型确定特征的重要性。调整决策阈值以确定最佳截断值,并进行 Kaplan-Meier 生存分析。

结果

训练集包含来自台北医学大学附属医院和万芳医院的 4075 名患者,其中 583 名(14.3%)发生了胺碘酮引起的甲状腺功能障碍,而外部测试集包含来自台北医学大学双和医院的 2422 名患者,其中 275 名(11.4%)发生了胺碘酮引起的甲状腺功能障碍。在所有 16 种模型中,极端梯度提升过采样机器学习模型表现出最佳的预测结果。准确性、精确率、召回率、F1 评分、G-mean、AUPRC 和 AUROC 分别为 0.923、0.632、0.756、0.688、0.845、0.751 和 0.934。调整截断值后,最佳值为 0.627,F1 评分达到 0.699。最佳阈值能够将 2422 名患者中的 286 名(11.8%)分类为高危人群,其中 275 名是测试集中的真阳性患者。较短的治疗持续时间;较高的促甲状腺激素和高密度脂蛋白胆固醇水平;以及较低的游离甲状腺素、碱性磷酸酶和低密度脂蛋白水平是最重要的特征。

结论

机器学习模型结合重采样方法可预测胺碘酮引起的甲状腺功能障碍,并可作为个体化风险预测和临床决策支持的辅助工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d802/9944157/ec0c492bde71/jmir_v25i1e43734_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验