基于秩的贝叶斯变量选择在全基因组转录组分析中的应用。

Rank-based Bayesian variable selection for genome-wide transcriptomic analyses.

机构信息

Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.

Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.

出版信息

Stat Med. 2022 Oct 15;41(23):4532-4553. doi: 10.1002/sim.9524. Epub 2022 Jul 18.

DOI:10.1002/sim.9524

PMID:35844145

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9796757/

Abstract

Variable selection is crucial in high-dimensional omics-based analyses, since it is biologically reasonable to assume only a subset of non-noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank-based unsupervised transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. Simulation studies demonstrate the versatility and robustness of the proposed method in a variety of scenarios, as well as its superiority with respect to several competitors when varying the data dimension or data generating process. We use the novel approach to analyze genome-wide RNAseq gene expression data from ovarian cancer patients: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the usefulness of the method in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.

摘要

变量选择在基于组学的高维分析中至关重要，因为假设只有一小部分非噪声特征有助于数据结构是符合生物学合理性的。然而，在无监督环境下，这项任务特别困难，尽管存在明显的缺点和缺乏可重复性，但先验的特定变量选择仍然是一种非常常见的方法。我们提出了一种基于贝叶斯的变量选择方法，用于基于秩的无监督转录组学分析。与经典统计方法相比，利用数据排名而不是实际的连续测量值来增加结论的稳健性，并且将变量选择嵌入推理任务中可以实现完全可重复性。具体来说，我们为变量选择开发了一种新颖的贝叶斯马罗模型扩展，允许进行完整的概率分析，从而对不确定性进行一致的量化。模拟研究表明，在各种情况下，所提出的方法具有多功能性和稳健性，并且在数据维度或数据生成过程发生变化时，与几个竞争对手相比具有优越性。我们使用新方法来分析卵巢癌患者的全基因组 RNAseq 基因表达数据：以完全无监督的方式正确检测到了一些影响癌症发展的基因，这表明该方法在癌症基因组学的特征发现方面的有用性。此外，进行不确定性量化的可能性在随后的生物学研究中也起着关键作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcfb/9796757/82530000352f/SIM-41-4532-g012.jpg

相似文献

Rank-based Bayesian variable selection for genome-wide transcriptomic analyses.基于秩的贝叶斯变量选择在全基因组转录组分析中的应用。

Stat Med. 2022 Oct 15;41(23):4532-4553. doi: 10.1002/sim.9524. Epub 2022 Jul 18.

A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.一种用于多类型组学数据综合聚类分析的全贝叶斯潜在变量模型。

Biostatistics. 2018 Jan 1;19(1):71-86. doi: 10.1093/biostatistics/kxx017.

Transcriptomic pan-cancer analysis using rank-based Bayesian inference.基于排名的贝叶斯推断的泛癌转录组分析。

Mol Oncol. 2023 Apr;17(4):548-563. doi: 10.1002/1878-0261.13354. Epub 2023 Jan 23.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies.眼见为实？癌症基因组学研究中高维统计推断的从业者视角。

Entropy (Basel). 2024 Sep 16;26(9):794. doi: 10.3390/e26090794.

Bayesian variable selection with graphical structure learning: Applications in integrative genomics.贝叶斯变量选择与图形结构学习：在整合基因组学中的应用。

PLoS One. 2018 Jul 30;13(7):e0195070. doi: 10.1371/journal.pone.0195070. eCollection 2018.

High-dimensional genomic feature selection with the ordered stereotype logit model.高维基因组特征选择的有序刻板逻辑模型。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac414.

Bayesian integrative model for multi-omics data with missingness.贝叶斯综合模型在多组学数据缺失中的应用。

Bioinformatics. 2018 Nov 15;34(22):3801-3808. doi: 10.1093/bioinformatics/bty775.

Joint Bayesian variable and graph selection for regression models with network-structured predictors.具有网络结构预测变量的回归模型的联合贝叶斯变量与图选择

Stat Med. 2016 Mar 30;35(7):1017-31. doi: 10.1002/sim.6792. Epub 2015 Oct 29.

Model uncertainty quantification in Cox regression.Cox 回归中的模型不确定性量化。

Biometrics. 2023 Sep;79(3):1726-1736. doi: 10.1111/biom.13823. Epub 2023 Jan 17.

引用本文的文献

Transcriptomic pan-cancer analysis using rank-based Bayesian inference.基于排名的贝叶斯推断的泛癌转录组分析。

Mol Oncol. 2023 Apr;17(4):548-563. doi: 10.1002/1878-0261.13354. Epub 2023 Jan 23.

本文引用的文献

Scalable probabilistic PCA for large-scale genetic variation data.可扩展概率主成分分析在大规模遗传变异数据中的应用。

PLoS Genet. 2020 May 29;16(5):e1008773. doi: 10.1371/journal.pgen.1008773. eCollection 2020 May.

Bayesian variable selection with graphical structure learning: Applications in integrative genomics.贝叶斯变量选择与图形结构学习：在整合基因组学中的应用。

PLoS One. 2018 Jul 30;13(7):e0195070. doi: 10.1371/journal.pone.0195070. eCollection 2018.

Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.起源细胞模式主导了 33 种癌症类型的 10000 个肿瘤的分子分类。

Cell. 2018 Apr 5;173(2):291-304.e6. doi: 10.1016/j.cell.2018.03.022.

Examining the common aetiology of serous ovarian cancers and basal-like breast cancers using double primaries.利用双原发癌研究浆液性卵巢癌和基底样乳腺癌的共同病因。

Br J Cancer. 2017 Apr 11;116(8):1088-1091. doi: 10.1038/bjc.2017.73. Epub 2017 Mar 23.

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.对12种癌症类型的多平台分析揭示了原发组织内部和之间的分子分类。

Cell. 2014 Aug 14;158(4):929-944. doi: 10.1016/j.cell.2014.06.049. Epub 2014 Aug 7.

Deciphering signatures of mutational processes operative in human cancer.解析人类癌症中发生的突变过程特征。

Cell Rep. 2013 Jan 31;3(1):246-59. doi: 10.1016/j.celrep.2012.12.008. Epub 2013 Jan 10.

Pathway analysis of genomic data: concepts, methods, and prospects for future development.基因组数据分析的途径分析：概念、方法和未来发展展望。

Trends Genet. 2012 Jul;28(7):323-32. doi: 10.1016/j.tig.2012.03.004. Epub 2012 Apr 3.

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.RSEM：有或无参考基因组的 RNA-Seq 数据的准确转录本定量。

BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.

Boosting signal-to-noise in complex biology: prior knowledge is power.提高复杂生物学中的信号噪声比：先验知识就是力量。

Cell. 2011 Mar 18;144(6):860-3. doi: 10.1016/j.cell.2011.03.007.

MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.MapSplice：用于剪接位点发现的 RNA-seq 读段的精确映射。

Nucleic Acids Res. 2010 Oct;38(18):e178. doi: 10.1093/nar/gkq622. Epub 2010 Aug 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于秩的贝叶斯变量选择在全基因组转录组分析中的应用。

Rank-based Bayesian variable selection for genome-wide transcriptomic analyses.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献