School of Information Technology and Engineering, University of Ottawa, Ottawa, Ontario, K1N 6N5 Canada.
Bioinformatics. 2011 Sep 1;27(17):2391-8. doi: 10.1093/bioinformatics/btr337. Epub 2011 Jul 9.
Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments. However, AEA isolates and tests for overrepresentation only individual annotation terms or groups of similar terms and is limited in its ability to uncover complex phenomena involving relationship between multiple annotation terms from various knowledge bases. Also, AEA assumes that annotations describe the whole object of interest, which makes it difficult to apply it to sets of compound objects (e.g. sets of protein-protein interactions) and to sets of objects having an internal structure (e.g. protein complexes).
We propose a novel logic-based Annotation Concept Synthesis and Enrichment Analysis (ACSEA) approach. ACSEA fuses inductive logic reasoning with statistical inference to uncover more complex phenomena captured by the experiments. We evaluate our approach on large-scale datasets from several microarray experiments and on a clustered genome-wide genetic interaction network using different biological knowledge bases. The discovered interpretations have lower P-values than the interpretations found by AEA, are highly integrative in nature, and include analysis of quantitative and structured information present in the knowledge bases. The results suggest that ACSEA can boost effectiveness of the processing of high-throughput experiments.
注释富集分析(AEA)是一种广泛使用的分析方法,用于处理高通量基因组和蛋白质组实验(如基因表达微阵列)生成的数据。该分析揭示并总结了实验(例如一组差异表达基因、一个聚类)确定的基因集的区分背景信息(例如 GO 注释)。发现的信息被人类专家用于寻找实验的生物学解释。然而,AEA 仅隔离和测试单个注释项或类似术语的组的过表达,并且在发现涉及来自各种知识库的多个注释项之间的关系的复杂现象方面能力有限。此外,AEA 假设注释描述了感兴趣的整个对象,这使得它难以将其应用于化合物对象集(例如蛋白质-蛋白质相互作用集)和具有内部结构的对象集(例如蛋白质复合物)。
我们提出了一种新颖的基于逻辑的注释概念综合和富集分析(ACSEA)方法。ACSEA 将归纳逻辑推理与统计推断相结合,以发现实验捕获的更复杂现象。我们使用不同的生物知识库在来自多个微阵列实验的大规模数据集和全基因组遗传相互作用网络的聚类上评估我们的方法。发现的解释比 AEA 找到的解释具有更低的 P 值,本质上高度综合,并包括对知识库中存在的定量和结构化信息的分析。结果表明,ACSEA 可以提高高通量实验处理的效果。