Chan Fan-Ying, Ku Yi-En, Lie Wen-Nung, Chen Hsiang-Yin
Department of Clinical Pharmacy, College of Pharmacy, Taipei Medical University, 250 Wuxing St, Xinyi Dist, Taipei, 11031, Taiwan, 886 2-2736-1661.
Department of Electrical Engineering, National Chung Cheng University, Chiayi, Taiwan.
JMIR Form Res. 2025 Apr 10;9:e67767. doi: 10.2196/67767.
Unlike one-snap data collection methods that only identify high-risk patients, machine learning models using time-series data can predict adverse events and aid in the timely management of cancer.
This study aimed to develop and validate machine learning models for sunitinib- and sorafenib-associated thyroid dysfunction using a time-series data collection approach.
Time series data of patients first prescribed sunitinib or sorafenib were collected from a deidentified clinical research database. Logistic regression, random forest, adaptive Boosting, Light Gradient-Boosting Machine, and Gradient Boosting Decision Tree were used to develop the models. Prediction performances were compared using the accuracy, precision, recall, F1-score, area under the receiver operating characteristic curve, and area under the precision-recall curve. The optimal threshold for the best-performing model was selected based on the maximum F1-score. SHapley Additive exPlanations analysis was conducted to assess feature importance and contributions at both the cohort and patient levels.
The training cohort included 609 patients, while the temporal validation cohort had 198 patients. The Gradient Boosting Decision Tree model without resampling outperformed other models, with area under the precision-recall curve of 0.600, area under the receiver operating characteristic curve of 0.876, and F1-score of 0.583 after adjusting the threshold. The SHapley Additive exPlanations analysis identified higher cholesterol levels, longer summed days of medication use, and clear cell adenocarcinoma histology as the most important features. The final model was further integrated into a web-based application.
This model can serve as an explainable adverse drug reaction surveillance system for predicting sunitinib- and sorafenib-associated thyroid dysfunction.
与仅能识别高危患者的单次数据收集方法不同,使用时间序列数据的机器学习模型可以预测不良事件,并有助于癌症的及时管理。
本研究旨在采用时间序列数据收集方法,开发并验证用于预测舒尼替尼和索拉非尼相关甲状腺功能障碍的机器学习模型。
从一个经过去识别处理的临床研究数据库中收集首次开具舒尼替尼或索拉非尼处方的患者的时间序列数据。使用逻辑回归、随机森林、自适应增强、轻梯度提升机和梯度提升决策树来开发模型。使用准确率、精确率、召回率、F1分数、受试者工作特征曲线下面积和精确率-召回率曲线下面积来比较预测性能。基于最大F1分数为表现最佳的模型选择最佳阈值。进行SHapley加性解释分析以评估队列和患者层面的特征重要性和贡献。
训练队列包括609名患者,而时间验证队列有198名患者。未进行重采样的梯度提升决策树模型优于其他模型,调整阈值后精确率-召回率曲线下面积为0.600,受试者工作特征曲线下面积为0.876,F1分数为0.583。SHapley加性解释分析确定较高的胆固醇水平、较长的用药总天数和透明细胞腺癌组织学为最重要的特征。最终模型进一步集成到一个基于网络的应用程序中。
该模型可作为一个可解释的药物不良反应监测系统,用于预测舒尼替尼和索拉非尼相关的甲状腺功能障碍。