Suppr超能文献

眼见为实?癌症基因组学研究中高维统计推断的从业者视角。

Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies.

作者信息

Fan Kun, Subedi Srijana, Yang Gongshun, Lu Xi, Ren Jie, Wu Cen

机构信息

Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.

Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX 77204, USA.

出版信息

Entropy (Basel). 2024 Sep 16;26(9):794. doi: 10.3390/e26090794.

Abstract

Variable selection methods have been extensively developed for and applied to cancer genomics data to identify important omics features associated with complex disease traits, including cancer outcomes. However, the reliability and reproducibility of the findings are in question if valid inferential procedures are not available to quantify the uncertainty of the findings. In this article, we provide a gentle but systematic review of high-dimensional frequentist and Bayesian inferential tools under sparse models which can yield uncertainty quantification measures, including confidence (or Bayesian credible) intervals, values and false discovery rates (FDR). Connections in high-dimensional inferences between the two realms have been fully exploited under the "unpenalized loss function + penalty term" formulation for regularization methods and the "likelihood function × shrinkage prior" framework for regularized Bayesian analysis. In particular, we advocate for robust Bayesian variable selection in cancer genomics studies due to its ability to accommodate disease heterogeneity in the form of heavy-tailed errors and structured sparsity while providing valid statistical inference. The numerical results show that robust Bayesian analysis incorporating exact sparsity has yielded not only superior estimation and identification results but also valid Bayesian credible intervals under nominal coverage probabilities compared with alternative methods, especially in the presence of heavy-tailed model errors and outliers.

摘要

变量选择方法已被广泛开发并应用于癌症基因组学数据,以识别与复杂疾病特征(包括癌症预后)相关的重要组学特征。然而,如果没有有效的推断程序来量化研究结果的不确定性,那么这些结果的可靠性和可重复性就会受到质疑。在本文中,我们对稀疏模型下的高维频率主义和贝叶斯推断工具进行了简要但系统的综述,这些工具可以产生不确定性量化度量,包括置信(或贝叶斯可信)区间、 值和错误发现率(FDR)。在正则化方法的“无惩罚损失函数 + 惩罚项”公式和正则化贝叶斯分析的“似然函数×收缩先验”框架下,充分利用了这两个领域在高维推断中的联系。特别是,我们提倡在癌症基因组学研究中进行稳健的贝叶斯变量选择,因为它能够以重尾误差和结构化稀疏性的形式适应疾病异质性,同时提供有效的统计推断。数值结果表明,与其他方法相比,结合精确稀疏性的稳健贝叶斯分析不仅产生了更好的估计和识别结果,而且在名义覆盖概率下产生了有效的贝叶斯可信区间,尤其是在存在重尾模型误差和异常值的情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff45/11430850/67e34c48bd35/entropy-26-00794-g0A1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验