Suppr超能文献

贝叶斯收缩先验模型在分类反应临床研究中的应用。

Applications of Bayesian shrinkage prior models in clinical research with categorical responses.

机构信息

Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY, USA.

Biostatistics & Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY, USA.

出版信息

BMC Med Res Methodol. 2022 Apr 28;22(1):126. doi: 10.1186/s12874-022-01560-6.

Abstract

BACKGROUND

Prediction and classification algorithms are commonly used in clinical research for identifying patients susceptible to clinical conditions such as diabetes, colon cancer, and Alzheimer's disease. Developing accurate prediction and classification methods benefits personalized medicine. Building an excellent predictive model involves selecting the features that are most significantly associated with the outcome. These features can include several biological and demographic characteristics, such as genomic biomarkers and health history. Such variable selection becomes challenging when the number of potential predictors is large. Bayesian shrinkage models have emerged as popular and flexible methods of variable selection in regression settings. This work discusses variable selection with three shrinkage priors and illustrates its application to clinical data such as Pima Indians Diabetes, Colon cancer, ADNI, and OASIS Alzheimer's real-world data.

METHODS

A unified Bayesian hierarchical framework that implements and compares shrinkage priors in binary and multinomial logistic regression models is presented. The key feature is the representation of the likelihood by a Polya-Gamma data augmentation, which admits a natural integration with a family of shrinkage priors, specifically focusing on Horseshoe, Dirichlet Laplace, and Double Pareto priors. Extensive simulation studies are conducted to assess the performances under different data dimensions and parameter settings. Measures of accuracy, AUC, brier score, L1 error, cross-entropy, and ROC surface plots are used as evaluation criteria comparing the priors with frequentist methods as Lasso, Elastic-Net, and Ridge regression.

RESULTS

All three priors can be used for robust prediction on significant metrics, irrespective of their categorical response model choices. Simulation studies could achieve the mean prediction accuracy of 91.6% (95% CI: 88.5, 94.7) and 76.5% (95% CI: 69.3, 83.8) for logistic regression and multinomial logistic models, respectively. The model can identify significant variables for disease risk prediction and is computationally efficient.

CONCLUSIONS

The models are robust enough to conduct both variable selection and prediction because of their high shrinkage properties and applicability to a broad range of classification problems.

摘要

背景

预测和分类算法常用于临床研究,以识别易患糖尿病、结肠癌和阿尔茨海默病等临床疾病的患者。开发准确的预测和分类方法有利于个性化医疗。建立一个优秀的预测模型需要选择与结果最显著相关的特征。这些特征可以包括几个生物和人口统计学特征,如基因组生物标志物和健康史。当潜在预测因子的数量较大时,这种变量选择会变得具有挑战性。贝叶斯收缩模型已成为回归环境中变量选择的流行且灵活的方法。本工作讨论了三种收缩先验的变量选择,并说明了其在 Pima Indians Diabetes、Colon cancer、ADNI 和 OASIS Alzheimer's 真实世界数据等临床数据中的应用。

方法

提出了一种统一的贝叶斯分层框架,该框架实现并比较了二进制和多项逻辑回归模型中的收缩先验。关键特征是通过 Polya-Gamma 数据增强来表示似然,这允许与收缩先验家族自然集成,特别是专注于马蹄铁、狄利克雷拉普拉斯和双帕累托先验。进行了广泛的模拟研究,以评估在不同数据维度和参数设置下的性能。准确性、AUC、Brier 得分、L1 误差、交叉熵和 ROC 曲面图等度量标准被用作评估标准,将先验与作为 Lasso、Elastic-Net 和 Ridge 回归的频率方法进行比较。

结果

无论其分类响应模型选择如何,这三种先验都可以用于稳健的预测重要指标。模拟研究可以实现逻辑回归和多项逻辑模型的平均预测准确率分别为 91.6%(95%置信区间:88.5,94.7)和 76.5%(95%置信区间:69.3,83.8)。该模型能够识别疾病风险预测的显著变量,并且计算效率高。

结论

由于其高收缩特性和适用于广泛的分类问题,这些模型足够稳健,可以进行变量选择和预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c06/9047306/9513d658a1aa/12874_2022_1560_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验