利用人口统计学和用药史的大规模纵向数据预测阳性检测结果。

Predicting positive test results using large-scale longitudinal data of demographics and medication history.

作者信息

Pham Anh, El-Kareh Robert, Myers Frank, Ohno-Machado Lucila, Kuo Tsung-Ting

机构信息

Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.

UCSD Health System, San Diego, CA, USA.

出版信息

Heliyon. 2024 Dec 18;11(1):e41350. doi: 10.1016/j.heliyon.2024.e41350. eCollection 2025 Jan 15.

DOI:10.1016/j.heliyon.2024.e41350

PMID:39958729

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11825254/

Abstract

BACKGROUND

infection is a major health threat. Healthcare institutions have strong medical and financial incentives to keep infections under control. Blanket testing at admission is in general not recommended, and current predictive models either used moderate sample sizes, over-inflated the number of covariates, or chose non-interpretable algorithms. We aim to develop models using patient data to predict positive test results with discrimination performance, interpretable results, and a reasonable number of covariates that reflect health over a long-time span.

MATERIALS AND METHODS

We processed records from 157,493 University of California San Diego Health patients seen between January 01, 2016-July 03, 2019 with at least 6 months of medication history, excluding pregnant women, patients under 18, and prisoners. Three models (Logistic Regression, Random Forest, and Ensemble) were constructed using hyper-parameters selected through 10-fold cross-validation. Model performance was measured by the Area Under the Receiver Operating Characteristic Curve (AUROC). The model coefficients' odds ratios and p-values were calculated for the Logistic Regression model, as were Gini indices for Random Forest. Decision boundary analysis was conducted using pair-wise false positive and false negative cases each model would predict at a specific threshold.

RESULTS

Logistic Regression, Random Forest, and Ensemble models yielded test AUROCs of 0.839, 0.851, and 0.866, respectively. Significant covariates that may affect risk include age, immuno-compromised treatments, past antibiotic uses, and some medications for the gastrointestinal tract.

CONCLUSIONS

The models achieve high discrimination performance (AUROC >0.83). There is a general consensus among different analysis approaches regarding predictors that impact patients' chances of having a positive test, which may influence risk, including features clinically proven to increase susceptibility. These human-interpretable models can help distinguish significant predictors that affect a patient's chance of testing positive, which may influence their risk.

摘要

背景

感染是一项重大的健康威胁。医疗机构在控制感染方面有着强大的医学和经济动机。一般不建议在入院时进行全面检测，而且当前的预测模型要么使用的样本量适中，要么夸大了协变量的数量，要么选择了不可解释的算法。我们旨在利用患者数据开发模型，以预测检测呈阳性的结果，同时具备区分性能、可解释的结果以及反映长期健康状况的合理数量的协变量。

材料与方法

我们处理了2016年1月1日至2019年7月3日期间在加州大学圣地亚哥分校医疗中心就诊的157493名患者的记录，这些患者至少有6个月的用药史，排除了孕妇、18岁以下患者和囚犯。使用通过10折交叉验证选择的超参数构建了三种模型（逻辑回归、随机森林和集成模型）。模型性能通过受试者操作特征曲线下面积（AUROC）来衡量。计算了逻辑回归模型的模型系数的比值比和p值，以及随机森林的基尼指数。使用每个模型在特定阈值下预测的成对假阳性和假阴性病例进行决策边界分析。

结果

逻辑回归、随机森林和集成模型的测试AUROC分别为0.839、0.851和0.866。可能影响风险的显著协变量包括年龄、免疫抑制治疗、过去使用抗生素的情况以及一些胃肠道用药。

结论

这些模型具有较高的区分性能（AUROC>0.83）。不同分析方法在影响患者检测呈阳性几率的预测因素方面达成了普遍共识，这些因素可能会影响风险，包括经临床证实会增加易感性的特征。这些可人工解释的模型有助于区分影响患者检测呈阳性几率的重要预测因素，而这可能会影响他们的风险。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用人口统计学和用药史的大规模纵向数据预测阳性检测结果。

Predicting positive test results using large-scale longitudinal data of demographics and medication history.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景

材料与方法

结果

结论

相似文献

本文引用的文献

利用人口统计学和用药史的大规模纵向数据预测阳性检测结果。

Predicting positive test results using large-scale longitudinal data of demographics and medication history.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景

材料与方法

结果

结论

相似文献

本文引用的文献