Global Health Economics, Amgen Inc, Thousand Oaks, CA, USA.
Thinking Machines Data Science, Manila, Philippines.
J Med Econ. 2021 Jan-Dec;24(1):1272-1279. doi: 10.1080/13696998.2021.1999132.
To evaluate the predictive performance of logistic and linear regression versus machine learning (ML) algorithms to identify patients with rheumatoid arthritis (RA) treated with target immunomodulators (TIMs) using only pharmacy administrative claims.
Adults aged 18-64 years with ≥1 TIM claim in the IBM MarketScan commercial database were included in this retrospective analysis. The predictive ability of logistic regression to identify RA patients was compared with supervised ML classification algorithms including random forest (RF), decision trees, linear support vector machines (SVMs), neural networks, naïve Bayes classifier, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbors (k-NN). Model performance was evaluated using F1 score, accuracy, precision, sensitivity, area under the receiver operating characteristic curve (AUROC), and Matthews correlation coefficient (MCC). Analyses were conducted in all-patient and etanercept-only samples.
In the all-patients sample, ML approaches did not outperform logistic regression. RF showed small improvements versus logistic regression that were not considered remarkable, respectively: F1 score (84.55% vs 83.96%), accuracy (84.05% vs 83.79%), sensitivity (84.53% vs 82.20%), AUROC (84.04% vs 83.85%), and MCC (68.07% vs 67.66%). Findings were similar in the etanercept samples.
Logistic regression and ML approaches successfully identified patients with RA in a large pharmacy administrative claims database. The ML algorithms were no better than logistic regression at prediction. RF, SVMs, LDA, and ridge classifier showed comparable performance, while neural networks, decision trees, naïve Bayes classifier, and QDA underperformed compared with logistic regression in identifying patients with RA.
评估逻辑回归和线性回归与机器学习(ML)算法在仅使用药房管理索赔数据识别接受靶向免疫调节剂(TIM)治疗的类风湿关节炎(RA)患者方面的预测性能。
本回顾性分析纳入了 IBM MarketScan 商业数据库中年龄在 18-64 岁之间、至少有 1 次 TIM 索赔的成年人。逻辑回归识别 RA 患者的预测能力与监督 ML 分类算法(包括随机森林(RF)、决策树、线性支持向量机(SVM)、神经网络、朴素贝叶斯分类器、线性判别分析(LDA)、二次判别分析(QDA)和 K-最近邻(k-NN))进行了比较。使用 F1 评分、准确性、精确度、灵敏度、受试者工作特征曲线下的面积(AUROC)和马修斯相关系数(MCC)评估模型性能。在所有患者样本和依那西普单药样本中进行了分析。
在所有患者样本中,ML 方法并未优于逻辑回归。RF 相对于逻辑回归略有改进,但并不显著,分别为:F1 评分(84.55% vs 83.96%)、准确性(84.05% vs 83.79%)、灵敏度(84.53% vs 82.20%)、AUROC(84.04% vs 83.85%)和 MCC(68.07% vs 67.66%)。在依那西普样本中也得出了相似的结论。
逻辑回归和 ML 方法成功地在大型药房管理索赔数据库中识别出 RA 患者。ML 算法在预测方面并不优于逻辑回归。RF、SVM、LDA 和岭分类器表现相当,而神经网络、决策树、朴素贝叶斯分类器和 QDA 在识别 RA 患者方面的表现逊于逻辑回归。