通过对可变聚类进行熵最小化（EMVC）优化基因集注释。

Optimization of gene set annotations via entropy minimization over variable clusters (EMVC).

机构信息

Departments of Genetics and Community and Family Medicine, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA.

出版信息

Bioinformatics. 2014 Jun 15;30(12):1698-706. doi: 10.1093/bioinformatics/btu110. Epub 2014 Feb 25.

DOI:10.1093/bioinformatics/btu110

PMID:24574114

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4058919/

Abstract

MOTIVATION

Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar datasets.

RESULTS

We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets. As shown using simulated gene sets with simulated data and Molecular Signatures Database collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results.

AVAILABILITY AND IMPLEMENTATION

http://cran.r-project.org/web/packages/EMVC/index.html.

摘要

动机

基因集富集已成为解释高通量基因组实验结果的关键工具。然而，注释质量不一致和缺乏注释特异性限制了富集方法的统计能力，并使得在生物学相似的数据集之间难以复制富集结果。

结果

我们提出了一种新的算法，用于优化基因集注释，以最佳匹配特定经验数据源的结构。我们提出的方法是通过变量聚类的最小熵（EMVC），对每个基因集的注释进行过滤，以最小化针对多个自举重采样数据集的多个聚类大小计算的不相交基因聚类的熵度量。正如使用模拟基因集和带有微阵列基因表达数据的分子特征数据库集合的模拟数据所显示的那样，EMVC 算法准确地过滤与实验结果无关的注释，从而提高了基因集富集能力，并更好地复制了富集结果。

可用性和实现

http://cran.r-project.org/web/packages/EMVC/index.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb6b/4058919/e7617c67b959/btu110f1.jpg

相似文献

Optimization of gene set annotations via entropy minimization over variable clusters (EMVC).

Bioinformatics. 2014 Jun 15;30(12):1698-706. doi: 10.1093/bioinformatics/btu110. Epub 2014 Feb 25.

Optimizing gene set annotations combining GO structure and gene expression data.

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):133. doi: 10.1186/s12918-018-0659-6.

Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.

Bioinformatics. 2004 Nov 1;20(16):2534-44. doi: 10.1093/bioinformatics/bth280. Epub 2004 Apr 29.

Annotation concept synthesis and enrichment analysis: a logic-based approach to the interpretation of high-throughput experiments.

Bioinformatics. 2011 Sep 1;27(17):2391-8. doi: 10.1093/bioinformatics/btr337. Epub 2011 Jul 9.

Spectral gene set enrichment (SGSE).

BMC Bioinformatics. 2015 Mar 3;16:70. doi: 10.1186/s12859-015-0490-7.

Optimization of Gene Set Annotations Using Robust Trace-Norm Multitask Learning.

IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):1016-1021. doi: 10.1109/TCBB.2017.2690427. Epub 2017 Apr 3.

Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data.

BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-10-S11-S9.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies.

BMC Bioinformatics. 2011 Apr 11;12:92. doi: 10.1186/1471-2105-12-92.

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

BMC Bioinformatics. 2009 Jan 20;10:27. doi: 10.1186/1471-2105-10-27.

引用本文的文献

Optimizing gene set annotations combining GO structure and gene expression data.

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):133. doi: 10.1186/s12918-018-0659-6.

Combining rules, background knowledge and change patterns to maintain semantic annotations.

AMIA Annu Symp Proc. 2018 Apr 16;2017:505-514. eCollection 2017.

Gene Set Enrichment Analyses: lessons learned from the heart failure phenotype.

BioData Min. 2017 May 26;10:18. doi: 10.1186/s13040-017-0137-5. eCollection 2017.

An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1076-86. doi: 10.1109/TCBB.2015.2415815.

Spectral gene set enrichment (SGSE).

BMC Bioinformatics. 2015 Mar 3;16:70. doi: 10.1186/s12859-015-0490-7.

本文引用的文献

An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB.

Bioinformatics. 2012 Sep 15;28(18):i562-i568. doi: 10.1093/bioinformatics/bts372.

Mining GO annotations for improving annotation consistency.

PLoS One. 2012;7(7):e40519. doi: 10.1371/journal.pone.0040519. Epub 2012 Jul 25.

Quality of computationally inferred gene ontology annotations.

PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.

Camera: a competitive gene set test accounting for inter-gene correlation.

Nucleic Acids Res. 2012 Sep 1;40(17):e133. doi: 10.1093/nar/gks461. Epub 2012 May 25.

Ten years of pathway analysis: current approaches and outstanding challenges.

PLoS Comput Biol. 2012;8(2):e1002375. doi: 10.1371/journal.pcbi.1002375. Epub 2012 Feb 23.

Detecting novel associations in large data sets.

Science. 2011 Dec 16;334(6062):1518-24. doi: 10.1126/science.1205438.

Gene set enrichment analysis: performance evaluation and usage guidelines.

Brief Bioinform. 2012 May;13(3):281-91. doi: 10.1093/bib/bbr049. Epub 2011 Sep 7.

Molecular signatures database (MSigDB) 3.0.

Bioinformatics. 2011 Jun 15;27(12):1739-40. doi: 10.1093/bioinformatics/btr260. Epub 2011 May 5.

GOChase-II: correcting semantic inconsistencies from Gene Ontology-based annotations for gene products.

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-12-S1-S40.

The what, where, how and why of gene ontology--a primer for bioinformaticians.

Brief Bioinform. 2011 Nov;12(6):723-35. doi: 10.1093/bib/bbr002. Epub 2011 Feb 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过对可变聚类进行熵最小化（EMVC）优化基因集注释。

Optimization of gene set annotations via entropy minimization over variable clusters (EMVC).

机构信息

Departments of Genetics and Community and Family Medicine, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA.