贝叶斯收缩先验模型在分类反应临床研究中的应用。

Applications of Bayesian shrinkage prior models in clinical research with categorical responses.

机构信息

Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY, USA.

Biostatistics & Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY, USA.

出版信息

BMC Med Res Methodol. 2022 Apr 28;22(1):126. doi: 10.1186/s12874-022-01560-6.

DOI:10.1186/s12874-022-01560-6

PMID:35484507

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9046716/

Abstract

BACKGROUND

Prediction and classification algorithms are commonly used in clinical research for identifying patients susceptible to clinical conditions such as diabetes, colon cancer, and Alzheimer's disease. Developing accurate prediction and classification methods benefits personalized medicine. Building an excellent predictive model involves selecting the features that are most significantly associated with the outcome. These features can include several biological and demographic characteristics, such as genomic biomarkers and health history. Such variable selection becomes challenging when the number of potential predictors is large. Bayesian shrinkage models have emerged as popular and flexible methods of variable selection in regression settings. This work discusses variable selection with three shrinkage priors and illustrates its application to clinical data such as Pima Indians Diabetes, Colon cancer, ADNI, and OASIS Alzheimer's real-world data.

METHODS

A unified Bayesian hierarchical framework that implements and compares shrinkage priors in binary and multinomial logistic regression models is presented. The key feature is the representation of the likelihood by a Polya-Gamma data augmentation, which admits a natural integration with a family of shrinkage priors, specifically focusing on Horseshoe, Dirichlet Laplace, and Double Pareto priors. Extensive simulation studies are conducted to assess the performances under different data dimensions and parameter settings. Measures of accuracy, AUC, brier score, L1 error, cross-entropy, and ROC surface plots are used as evaluation criteria comparing the priors with frequentist methods as Lasso, Elastic-Net, and Ridge regression.

RESULTS

All three priors can be used for robust prediction on significant metrics, irrespective of their categorical response model choices. Simulation studies could achieve the mean prediction accuracy of 91.6% (95% CI: 88.5, 94.7) and 76.5% (95% CI: 69.3, 83.8) for logistic regression and multinomial logistic models, respectively. The model can identify significant variables for disease risk prediction and is computationally efficient.

CONCLUSIONS

The models are robust enough to conduct both variable selection and prediction because of their high shrinkage properties and applicability to a broad range of classification problems.

摘要

背景

预测和分类算法常用于临床研究，以识别易患糖尿病、结肠癌和阿尔茨海默病等临床疾病的患者。开发准确的预测和分类方法有利于个性化医疗。建立一个优秀的预测模型需要选择与结果最显著相关的特征。这些特征可以包括几个生物和人口统计学特征，如基因组生物标志物和健康史。当潜在预测因子的数量较大时，这种变量选择会变得具有挑战性。贝叶斯收缩模型已成为回归环境中变量选择的流行且灵活的方法。本工作讨论了三种收缩先验的变量选择，并说明了其在 Pima Indians Diabetes、Colon cancer、ADNI 和 OASIS Alzheimer's 真实世界数据等临床数据中的应用。

方法

提出了一种统一的贝叶斯分层框架，该框架实现并比较了二进制和多项逻辑回归模型中的收缩先验。关键特征是通过 Polya-Gamma 数据增强来表示似然，这允许与收缩先验家族自然集成，特别是专注于马蹄铁、狄利克雷拉普拉斯和双帕累托先验。进行了广泛的模拟研究，以评估在不同数据维度和参数设置下的性能。准确性、AUC、Brier 得分、L1 误差、交叉熵和 ROC 曲面图等度量标准被用作评估标准，将先验与作为 Lasso、Elastic-Net 和 Ridge 回归的频率方法进行比较。

结果

无论其分类响应模型选择如何，这三种先验都可以用于稳健的预测重要指标。模拟研究可以实现逻辑回归和多项逻辑模型的平均预测准确率分别为 91.6%（95%置信区间：88.5，94.7）和 76.5%（95%置信区间：69.3，83.8）。该模型能够识别疾病风险预测的显著变量，并且计算效率高。

结论

由于其高收缩特性和适用于广泛的分类问题，这些模型足够稳健，可以进行变量选择和预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c06/9047306/9513d658a1aa/12874_2022_1560_Fig1_HTML.jpg

相似文献

Applications of Bayesian shrinkage prior models in clinical research with categorical responses.

BMC Med Res Methodol. 2022 Apr 28;22(1):126. doi: 10.1186/s12874-022-01560-6.

Impact of prior specifications in a shrinkage-inducing Bayesian model for quantitative trait mapping and genomic prediction.

Genet Sel Evol. 2013 Jul 8;45(1):24. doi: 10.1186/1297-9686-45-24.

Prediction models for clustered data with informative priors for the random effects: a simulation study.

BMC Med Res Methodol. 2018 Aug 6;18(1):83. doi: 10.1186/s12874-018-0543-5.

Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events.

Stat Med. 2016 Mar 30;35(7):1159-77. doi: 10.1002/sim.6782. Epub 2015 Oct 29.

Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome.

J Indian Soc Probab Stat. 2024;25(2):491-515. doi: 10.1007/s41096-024-00194-9. Epub 2024 May 29.

Dirichlet-Laplace priors for optimal shrinkage.

J Am Stat Assoc. 2015 Dec 1;110(512):1479-1490. doi: 10.1080/01621459.2014.960967. Epub 2014 Sep 25.

The reciprocal Bayesian LASSO.

Stat Med. 2021 Sep 30;40(22):4830-4849. doi: 10.1002/sim.9098. Epub 2021 Jun 14.

Framework for personalized prediction of treatment response in relapsing remitting multiple sclerosis.

BMC Med Res Methodol. 2020 Feb 7;20(1):24. doi: 10.1186/s12874-020-0906-6.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Regularizing priors for Bayesian VAR applications to large ecological datasets.

PeerJ. 2022 Nov 8;10:e14332. doi: 10.7717/peerj.14332. eCollection 2022.

引用本文的文献

Socioeconomic status and lifestyle as factors of multimorbidity among older adults in China: results from the China Health and Retirement Longitudinal Survey.

Front Public Health. 2025 Jul 30;13:1586091. doi: 10.3389/fpubh.2025.1586091. eCollection 2025.

Prevalence and factors associated with Chinese herbal medicine use among middle-aged and older Chinese adults with diabetes mellitus.

Front Pharmacol. 2025 May 22;16:1482228. doi: 10.3389/fphar.2025.1482228. eCollection 2025.

Use of machine learning in predicting continuity of HIV treatment in selected Nigerian States.

PLOS Glob Public Health. 2025 Apr 24;5(4):e0004497. doi: 10.1371/journal.pgph.0004497. eCollection 2025.

James-Stein Estimator Improves Accuracy and Sample Efficiency in Human Kinematic and Metabolic Data.

Ann Biomed Eng. 2025 Apr 16. doi: 10.1007/s10439-025-03718-x.

Development of a Feed Conversion Ratio Prediction Model for Yorkshire Boars Using Cumulative Feed Intake.

Animals (Basel). 2025 Feb 11;15(4):507. doi: 10.3390/ani15040507.

James-Stein estimator improves accuracy and sample efficiency in human kinematic and metabolic data.

bioRxiv. 2024 Oct 17:2024.10.07.616339. doi: 10.1101/2024.10.07.616339.

Logistic regression analysis of the value of biomarkers, clinical symptoms, and imaging examinations in COVID-19 for SARS-CoV-2 nucleic acid detection.

Medicine (Baltimore). 2024 May 10;103(19):e38186. doi: 10.1097/MD.0000000000038186.

Revisiting the complex time-varying effect of non-pharmaceutical interventions on COVID-19 transmission in the United States.

Front Public Health. 2024 Feb 21;12:1343950. doi: 10.3389/fpubh.2024.1343950. eCollection 2024.

Applying precision medicine principles to the management of multimorbidity: the utility of comorbidity networks, graph machine learning, and knowledge graphs.

Front Med (Lausanne). 2024 Jan 24;10:1302844. doi: 10.3389/fmed.2023.1302844. eCollection 2023.

Clinical value of serum DJ-1 in lung adenocarcinoma.

PeerJ. 2024 Jan 29;12:e16845. doi: 10.7717/peerj.16845. eCollection 2024.

本文引用的文献

Individual-Level Fatality Prediction of COVID-19 Patients Using AI Methods.

Front Public Health. 2020 Sep 30;8:587937. doi: 10.3389/fpubh.2020.587937. eCollection 2020.

Fast sampling with Gaussian scale-mixture priors in high-dimensional regression.

Biometrika. 2016 Dec;103(4):985-991. doi: 10.1093/biomet/asw042. Epub 2016 Oct 27.

EBglmnet: a comprehensive R package for sparse generalized linear regression models.

Bioinformatics. 2021 Jul 12;37(11):1627-1629. doi: 10.1093/bioinformatics/btw143.

Dirichlet-Laplace priors for optimal shrinkage.

J Am Stat Assoc. 2015 Dec 1;110(512):1479-1490. doi: 10.1080/01621459.2014.960967. Epub 2014 Sep 25.

High-throughput sequencing technologies.

Mol Cell. 2015 May 21;58(4):586-97. doi: 10.1016/j.molcel.2015.05.004.

HUM calculator and HUM package for R: easy-to-use software tools for multicategory receiver operating characteristic analysis.

Bioinformatics. 2014 Jun 1;30(11):1635-6. doi: 10.1093/bioinformatics/btu086. Epub 2014 Feb 10.

HER2 in breast cancer: a review and update.

Adv Anat Pathol. 2014 Mar;21(2):100-7. doi: 10.1097/PAP.0000000000000015.

GENERALIZED DOUBLE PARETO SHRINKAGE.

Stat Sin. 2013 Jan 1;23(1):119-143.

Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data.

BMC Bioinformatics. 2012 Oct 17;13:270. doi: 10.1186/1471-2105-13-270.

Predicting Alzheimer's risk: why and how?

Alzheimers Res Ther. 2011 Nov 25;3(6):33. doi: 10.1186/alzrt95.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

贝叶斯收缩先验模型在分类反应临床研究中的应用。

Applications of Bayesian shrinkage prior models in clinical research with categorical responses.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献