Xue Ling, He Shan, Singla Rajeev K, Qin Qiong, Ding Yinglong, Liu Linsheng, Ding Xiaoliang, Bediaga-Bañeres Harbil, Arrasate Sonia, Durado-Sanchez Aliuska, Zhang Yuzhen, Shen Zhenya, Shen Bairong, Miao Liyan, González-Díaz Humberto
Department of Pharmacy, the First Affiliated Hospital of Soochow University.
Department of Pharmacology, Faculty of Medicine, University of The Basque Country (UPV/EHU), Bilbao, Basque Country.
Int J Surg. 2024 Oct 1;110(10):6528-6540. doi: 10.1097/JS9.0000000000001734.
Warfarin is a common oral anticoagulant, and its effects vary widely among individuals. Numerous dose-prediction algorithms have been reported based on cross-sectional data generated via multiple linear regression or machine learning. This study aimed to construct an information fusion perturbation theory and machine-learning prediction model of warfarin blood levels based on clinical longitudinal data from cardiac surgery patients.
The data of 246 patients were obtained from electronic medical records. Continuous variables were processed by calculating the distance of the raw data with the moving average (MA ∆v ki ( sj )), and categorical variables in different attribute groups were processed using Euclidean distance (ED ǁ∆v k ( sj )ǁ). Regression and classification analyses were performed on the raw data, MA ∆v ki ( sj ), and ED ǁ∆v k ( sj )ǁ. Different machine-learning algorithms were chosen for the STATISTICA and WEKA software.
The random forest (RF) algorithm was the best for predicting continuous outputs using the raw data. The correlation coefficients of the RF algorithm were 0.978 and 0.595 for the training and validation sets, respectively, and the mean absolute errors were 0.135 and 0.362 for the training and validation sets, respectively. The proportion of ideal predictions of the RF algorithm was 59.0%. General discriminant analysis (GDA) was the best algorithm for predicting the categorical outputs using the MA ∆v ki ( sj ) data. The GDA algorithm's total true positive rate (TPR) was 95.4% and 95.6% for the training and validation sets, respectively, with MA ∆v ki ( sj ) data.
An information fusion perturbation theory and machine-learning model for predicting warfarin blood levels was established. A model based on the RF algorithm could be used to predict the target international normalized ratio (INR), and a model based on the GDA algorithm could be used to predict the probability of being within the target INR range under different clinical scenarios.
华法林是一种常用的口服抗凝剂,其效果在个体间差异很大。基于通过多元线性回归或机器学习生成的横断面数据,已有许多剂量预测算法被报道。本研究旨在基于心脏手术患者的临床纵向数据构建华法林血药浓度的信息融合扰动理论和机器学习预测模型。
从电子病历中获取246例患者的数据。连续变量通过计算原始数据与移动平均值(MA ∆v ki ( sj ))的距离进行处理,不同属性组中的分类变量使用欧几里得距离(ED ǁ∆v k ( sj )ǁ)进行处理。对原始数据、MA ∆v ki ( sj )和ED ǁ∆v k ( sj )ǁ进行回归和分类分析。为STATISTICA和WEKA软件选择了不同的机器学习算法。
随机森林(RF)算法在使用原始数据预测连续输出方面表现最佳。RF算法在训练集和验证集上的相关系数分别为0.978和0.595,训练集和验证集的平均绝对误差分别为0.135和0.362。RF算法的理想预测比例为59.0%。广义判别分析(GDA)是使用MA ∆v ki ( sj )数据预测分类输出的最佳算法。对于训练集和验证集,GDA算法使用MA ∆v ki ( sj )数据时的总真阳性率(TPR)分别为95.4%和95.6%。
建立了预测华法林血药浓度的信息融合扰动理论和机器学习模型。基于RF算法的模型可用于预测目标国际标准化比值(INR),基于GDA算法的模型可用于预测在不同临床场景下处于目标INR范围内的概率。