用于高通量生物学中差异数据发现的广义经验贝叶斯方法。

Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology.

作者信息

Hardcastle Thomas J

机构信息

Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, UK.

出版信息

Bioinformatics. 2016 Jan 15;32(2):195-202. doi: 10.1093/bioinformatics/btv569. Epub 2015 Oct 1.

DOI:10.1093/bioinformatics/btv569

PMID:26428289

Abstract

MOTIVATION

High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a 'large P, small n' setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses.

RESULTS

We present here a generalized method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs.

AVAILABILITY AND IMPLEMENTATION

The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html.

CONTACT

tjh48@cam.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量数据如今在生物学研究中已很常见。技术和应用的快速变化意味着，越来越需要能够处理“大P，小n”情况的检测差异行为的新方法。一般而言，此类方法是临时开发的，需要进一步的开发周期，且分析之间缺乏标准化。

结果

我们在此提出一种通过经验贝叶斯方法在高通量生物学数据中识别差异行为的通用方法。该方法基于我们的baySeq算法，该算法基于负二项分布在RNA测序数据中识别差异表达，并基于β-二项分布在配对数据中识别差异表达。在这里，我们展示了相同的经验贝叶斯方法如何应用于任何参数分布，从而无需为不同分布的数据冗长地开发新方法。与为解决高通量生物学数据中的特定问题而开发的现有方法的比较表明，这些通用方法可以实现同等或更好的性能。还提出了对基本算法的一些改进，以增加灵活性并降低计算成本。

可用性和实现方式

这些方法在R语言的baySeq（v2）包中实现，可在Bioconductor上获取，网址为http://www.bioconductor.org/packages/release/bioc/html/baySeq.html。

联系方式

tjh48@cam.ac.uk

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology.用于高通量生物学中差异数据发现的广义经验贝叶斯方法。

Bioinformatics. 2016 Jan 15;32(2):195-202. doi: 10.1093/bioinformatics/btv569. Epub 2015 Oct 1.

Polyester: simulating RNA-seq datasets with differential transcript expression.聚酯：模拟具有差异转录本表达的RNA测序数据集。

Bioinformatics. 2015 Sep 1;31(17):2778-84. doi: 10.1093/bioinformatics/btv272. Epub 2015 Apr 28.

SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.SimSeq：一种用于RNA序列数据集模拟的非参数方法。

Bioinformatics. 2015 Jul 1;31(13):2131-40. doi: 10.1093/bioinformatics/btv124. Epub 2015 Feb 26.

EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments.EBSeq-HMM：一种用于在有序RNA测序实验中识别基因表达变化的贝叶斯方法。

Bioinformatics. 2015 Aug 15;31(16):2614-22. doi: 10.1093/bioinformatics/btv193. Epub 2015 Apr 5.

bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data.bayNorm：用于单细胞 RNA-seq 数据的贝叶斯基因表达恢复、插补和标准化。

Bioinformatics. 2020 Feb 15;36(4):1174-1181. doi: 10.1093/bioinformatics/btz726.

R/EBcoexpress: an empirical Bayesian framework for discovering differential co-expression.R/EBcoexpress：一种用于发现差异共表达的经验贝叶斯框架。

Bioinformatics. 2012 Jul 15;28(14):1939-40. doi: 10.1093/bioinformatics/bts268. Epub 2012 May 16.

compcodeR--an R package for benchmarking differential expression methods for RNA-seq data.compcodeR——一个用于对RNA测序数据差异表达方法进行基准测试的R软件包。

Bioinformatics. 2014 Sep 1;30(17):2517-8. doi: 10.1093/bioinformatics/btu324. Epub 2014 May 9.

QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data.QUBiC：一个用于基因共表达数据的定性双聚类分析的 Bioconductor 包。

Bioinformatics. 2017 Feb 1;33(3):450-452. doi: 10.1093/bioinformatics/btw635.

baySeq: empirical Bayesian methods for identifying differential expression in sequence count data.baySeq：用于识别序列计数数据中差异表达的经验贝叶斯方法。

BMC Bioinformatics. 2010 Aug 10;11:422. doi: 10.1186/1471-2105-11-422.

Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution.基于贝塔二项式分布的高通量测序数据配对的经验贝叶斯分析。

BMC Bioinformatics. 2013 Apr 23;14:135. doi: 10.1186/1471-2105-14-135.

引用本文的文献

Reduced circulating sphingolipids and activity are linked to T2D risk and impaired insulin secretion.循环中鞘脂水平降低及活性降低与2型糖尿病风险和胰岛素分泌受损有关。

Sci Adv. 2025 Jan 10;11(2):eadr1725. doi: 10.1126/sciadv.adr1725.

NR2F1 overexpression alleviates trophoblast cell dysfunction by inhibiting GDF15/MAPK axis in preeclampsia.NR2F1 过表达通过抑制子痫前期中 GDF15/MAPK 轴来减轻滋养细胞功能障碍。

Hum Cell. 2024 Sep;37(5):1405-1420. doi: 10.1007/s13577-024-01095-6. Epub 2024 Jul 15.

Evaluation of the antidermatophytic activity of potassium salts of N-acylhydrazinecarbodithioates and their aminotriazole-thione derivatives.评价 N-酰基腙二硫代羧酸酯及其氨基三唑硫酮衍生物的抗真菌活性。

Sci Rep. 2024 Feb 12;14(1):3521. doi: 10.1038/s41598-024-54025-9.

Disulfidptosis and its Role in Peripheral Blood Immune Cells after a Stroke: A New Frontier in Stroke Pathogenesis.二硫化物诱导的细胞死亡及其在中风后外周血免疫细胞中的作用：中风发病机制的新前沿

Curr Neurovasc Res. 2024;20(5):608-622. doi: 10.2174/0115672026286243240105115419.

Identification of candidate biomarkers and pathways associated with type 1 diabetes mellitus using bioinformatics analysis.基于生物信息学分析鉴定与 1 型糖尿病相关的候选生物标志物和通路。

Sci Rep. 2022 Jun 1;12(1):9157. doi: 10.1038/s41598-022-13291-1.

DysPIA: A Novel Dysregulated Pathway Identification Analysis Method.DysPIA：一种新型的失调通路识别分析方法。

Front Genet. 2021 Jul 5;12:647653. doi: 10.3389/fgene.2021.647653. eCollection 2021.

Bioinformatic screening for candidate biomarkers and their prognostic values in endometrial cancer.生物信息学筛选子宫内膜癌候选生物标志物及其预后价值。

BMC Genet. 2020 Sep 22;21(1):113. doi: 10.1186/s12863-020-00898-4.

Identification of crucial genes and pathways associated with colorectal cancer by bioinformatics analysis.通过生物信息学分析鉴定与结直肠癌相关的关键基因和通路。

Oncol Lett. 2020 Mar;19(3):1881-1889. doi: 10.3892/ol.2020.11278. Epub 2020 Jan 9.

Prognostic values and prospective pathway signaling of MicroRNA-182 in ovarian cancer: a study based on gene expression omnibus (GEO) and bioinformatics analysis.miR-182 在卵巢癌中的预后价值及潜在通路信号分析：基于基因表达综合数据库（GEO）和生物信息学分析的研究。

J Ovarian Res. 2019 Nov 8;12(1):106. doi: 10.1186/s13048-019-0580-7.

Identification of differentially expressed genes and enriched pathways in lung cancer using bioinformatics analysis.基于生物信息学分析鉴定肺癌差异表达基因及富集通路。

Mol Med Rep. 2019 Mar;19(3):2029-2040. doi: 10.3892/mmr.2019.9878. Epub 2019 Jan 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于高通量生物学中差异数据发现的广义经验贝叶斯方法。

Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献