Suppr超能文献

机器学习辅助的相关协变量筛选:应用于去甲丙咪嗪的临床数据

Machine-Learning Assisted Screening of Correlated Covariates: Application to Clinical Data of Desipramine.

作者信息

Asiimwe Innocent Gerald, S'fiso Ndzamba Bonginkosi, Mouksassi Samer, Pillai Goonaseelan Colin, Lombard Aurelie, Lang Jennifer

机构信息

The Wolfson Centre for Personalized Medicine, Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.

APT-Africa Fellowship Program, c/o Pharmacometrics Africa NPC, K45 Old Main Building, Groote Schuur Hospital, Cape Town, South Africa.

出版信息

AAPS J. 2024 May 30;26(4):63. doi: 10.1208/s12248-024-00934-6.

Abstract

Stepwise covariate modeling (SCM) has a high computational burden and can select the wrong covariates. Machine learning (ML) has been proposed as a screening tool to improve the efficiency of covariate selection, but little is known about how to apply ML on actual clinical data. First, we simulated datasets based on clinical data to compare the performance of various ML and traditional pharmacometrics (PMX) techniques with and without accounting for highly-correlated covariates. This simulation step identified the ML algorithm and the number of top covariates to select when using the actual clinical data. A previously developed desipramine population-pharmacokinetic model was used to simulate virtual subjects. Fifteen covariates were considered with four having an effect included. Based on the F1 score (an accuracy measure), ridge regression was the most accurate ML technique on 200 simulated datasets (F1 score = 0.475 ± 0.231), a performance which almost doubled when highly-correlated covariates were accounted for (F1 score = 0.860 ± 0.158). These performances were better than forwards selection with SCM (F1 score = 0.251 ± 0.274 and 0.499 ± 0.381 without/with correlations respectively). In terms of computational cost, ridge regression (0.42 ± 0.07 seconds/simulated dataset, 1 thread) was ~20,000 times faster than SCM (2.30 ± 2.29 hours, 15 threads). On the clinical dataset, prescreening with the selected ML algorithm reduced SCM runtime by 42.86% (from 1.75 to 1.00 days) and produced the same final model as SCM only. In conclusion, we have demonstrated that accounting for highly-correlated covariates improves ML prescreening accuracy. The choice of ML method and the proportion of important covariates (unknown a priori) can be guided by simulations.

摘要

逐步协变量建模(SCM)计算负担高,且可能选择错误的协变量。机器学习(ML)已被提议作为一种筛选工具,以提高协变量选择的效率,但对于如何将ML应用于实际临床数据却知之甚少。首先,我们基于临床数据模拟数据集,比较各种ML和传统药代动力学(PMX)技术在考虑和不考虑高度相关协变量情况下的性能。这个模拟步骤确定了在使用实际临床数据时要选择的ML算法和顶级协变量的数量。使用先前开发的地昔帕明群体药代动力学模型来模拟虚拟受试者。考虑了15个协变量,其中4个有影响。基于F1分数(一种准确性度量),岭回归是200个模拟数据集上最准确的ML技术(F1分数 = 0.475 ± 0.231),当考虑高度相关协变量时,性能几乎翻倍(F1分数 = 0.860 ± 0.158)。这些性能优于SCM的向前选择(分别为不考虑/考虑相关性时的F1分数 = 0.251 ± 0.274和0.499 ± 0.381)。在计算成本方面,岭回归(0.42 ± 0.07秒/模拟数据集,1个线程)比SCM(2.30 ± 2.29小时,15个线程)快约20,000倍。在临床数据集上,使用选定的ML算法进行预筛选将SCM运行时间减少了42.86%(从1.75天降至1.00天),并且产生了与仅使用SCM相同的最终模型。总之,我们已经证明考虑高度相关协变量可提高ML预筛选的准确性。ML方法的选择和重要协变量的比例(先验未知)可以通过模拟来指导。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验