经验贝叶斯排序方法及其在高通量生物学中的应用。

An empirical Bayesian ranking method, with applications to high throughput biology.

机构信息

Biostatistics Division, HRB Clinical Research Facility, National University of Ireland Galway, Galway, Ireland.

Department of Statistics and Data Science, Yale University, New Haven, CT, USA.

出版信息

Bioinformatics. 2020 Jan 1;36(1):177-185. doi: 10.1093/bioinformatics/btz471.

DOI:10.1093/bioinformatics/btz471

PMID:31197345

Abstract

MOTIVATION

In bioinformatics, genome-wide experiments look for important biological differences between two groups at a large number of locations in the genome. Often, the final analysis focuses on a P-value-based ranking of locations which might then be investigated further in follow-up experiments. However, this strategy may result in small effect sizes, with low P-values, being ranked more favorably than larger more scientifically important effects. Bayesian ranking techniques may offer a solution to this problem provided a good prior distribution for the collective distribution of effect sizes is available.

RESULTS

We develop an Empirical Bayes ranking algorithm, using the marginal distribution of the data over all locations to estimate an appropriate prior. In simulations and analysis using real datasets, we demonstrate favorable performance compared to ordering P-values and a number of other competing ranking methods. The algorithm is computationally efficient and can be used to rank the entirety of genomic locations or to rank a subset of locations, pre-selected via traditional FWER/FDR methods in a 2-stage analysis.

AVAILABILITY AND IMPLEMENTATION

An R-package, EBrank, implementing the ranking algorithm is available on CRAN.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在生物信息学中，全基因组实验旨在寻找基因组中大量位置上两组之间的重要生物学差异。通常，最终分析侧重于基于 P 值的位置排序，然后可能在后续实验中进一步研究这些位置。然而，这种策略可能导致小的效应量，低 P 值的位置被排名更有利，而更大更有科学意义的效应则排名较低。贝叶斯排序技术可以提供一种解决方案，前提是可以获得效应大小的总体分布的良好先验分布。

结果

我们开发了一种经验贝叶斯排序算法，使用数据在所有位置上的边缘分布来估计适当的先验分布。在模拟和使用真实数据集的分析中，与排序 P 值和许多其他竞争排序方法相比，我们展示了良好的性能。该算法计算效率高，可用于对全基因组位置进行排序，也可用于对通过 2 阶段分析中传统的 FWER/FDR 方法预选的位置子集进行排序。

可用性和实现

一个实现排序算法的 R 包 EBrank 可在 CRAN 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

An empirical Bayesian ranking method, with applications to high throughput biology.经验贝叶斯排序方法及其在高通量生物学中的应用。

Bioinformatics. 2020 Jan 1;36(1):177-185. doi: 10.1093/bioinformatics/btz471.

Modified screening and ranking algorithm for copy number variation detection.用于拷贝数变异检测的改进筛选与排序算法

Bioinformatics. 2015 May 1;31(9):1341-8. doi: 10.1093/bioinformatics/btu850. Epub 2014 Dec 25.

GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies.GWASinlps：基于非局部先验的全基因组关联研究的迭代 SNP 选择工具。

Bioinformatics. 2019 Jan 1;35(1):1-11. doi: 10.1093/bioinformatics/bty472.

Empirical Bayes screening of many p-values with applications to microarray studies.用于微阵列研究的多p值经验贝叶斯筛选。

Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

synbreed: a framework for the analysis of genomic prediction data using R.synbreed：一个使用 R 进行基因组预测数据分析的框架。

Bioinformatics. 2012 Aug 1;28(15):2086-7. doi: 10.1093/bioinformatics/bts335. Epub 2012 Jun 10.

Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis.用交叉熵法准确高效地估计小 P 值：在基因组数据分析中的应用。

Bioinformatics. 2019 Jul 15;35(14):2441-2448. doi: 10.1093/bioinformatics/bty1005.

RANKS: a flexible tool for node label ranking and classification in biological networks.RANKS：一种用于生物网络中节点标签排序和分类的灵活工具。

Bioinformatics. 2016 Sep 15;32(18):2872-4. doi: 10.1093/bioinformatics/btw235. Epub 2016 Jun 2.

Family Rank: a graphical domain knowledge informed feature ranking algorithm.家族排序：一种基于图形领域知识的特征排序算法。

Bioinformatics. 2021 Oct 25;37(20):3626-3631. doi: 10.1093/bioinformatics/btab387.

ScreenBEAM: a novel meta-analysis algorithm for functional genomics screens via Bayesian hierarchical modeling.ScreenBEAM：一种通过贝叶斯层次模型进行功能基因组筛选的新型荟萃分析算法。

Bioinformatics. 2016 Jan 15;32(2):260-7. doi: 10.1093/bioinformatics/btv556. Epub 2015 Sep 28.

Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology.用于高通量生物学中差异数据发现的广义经验贝叶斯方法。

Bioinformatics. 2016 Jan 15;32(2):195-202. doi: 10.1093/bioinformatics/btv569. Epub 2015 Oct 1.

引用本文的文献

Bayesian Effect Size Ranking to Prioritise Genetic Risk Variants in Common Diseases for Follow-Up Studies.用于后续研究的常见疾病遗传风险变异优先级排序的贝叶斯效应大小排名

Genet Epidemiol. 2025 Jan;49(1):e22608. doi: 10.1002/gepi.22608.

Stem cell transcriptional profiles from mouse subspecies reveal cis-regulatory evolution at translation genes.从鼠亚种的干细胞转录图谱中揭示翻译基因的顺式调控进化。

Heredity (Edinb). 2024 Nov;133(5):308-316. doi: 10.1038/s41437-024-00715-z. Epub 2024 Aug 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

经验贝叶斯排序方法及其在高通量生物学中的应用。

An empirical Bayesian ranking method, with applications to high throughput biology.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献