Department of Biotechnology, BOKU University, Muthgasse 18, 1190 Vienna.
Bioinformatics. 2012 Sep 15;28(18):i603-i610. doi: 10.1093/bioinformatics/bts405.
MOTIVATION: Gene expression assays allow for genome scale analyses of molecular biological mechanisms. State-of-the-art data analysis provides lists of involved genes, either by calculating significance levels of mRNA abundance or by Bayesian assessments of gene activity. A common problem of such approaches is the difficulty of interpreting the biological implication of the resulting gene lists. This lead to an increased interest in methods for inferring high-level biological information. A common approach for representing high level information is by inferring gene ontology (GO) terms which may be attributed to the expression data experiment. RESULTS: This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical test-based approaches are in particular evident for sparsely annotated GO terms and in situations of large uncertainty about gene activity. Provided that appropriate annotations exist, the proposed approach is easily applied to inferring other high level assignments like pathways. AVAILABILITY: Source code under GPL license is available from the author. CONTACT: peter.sykacek@boku.ac.at.
动机:基因表达分析允许对分子生物学机制进行全基因组规模的分析。最先进的数据分析提供了涉及基因的列表,要么通过计算 mRNA 丰度的显著性水平,要么通过贝叶斯评估基因活性。这种方法的一个常见问题是难以解释产生的基因列表的生物学含义。这导致人们对推断高级别生物学信息的方法产生了浓厚的兴趣。表示高级别信息的一种常见方法是推断可能归因于表达数据实验的基因本体 (GO) 术语。
结果:本文提出了一种用于 GO 术语推断的概率模型。建模假设基因注释到 GO 术语是可用的,并且基因在实验中的参与度由基因特定指示变量的后验概率表示。这些概率度量是许多用于表达数据分析的贝叶斯方法的结果。所提出的模型以概率方式组合这些指示概率,并提供作为结果的概率 GO 术语分配。在合成和微阵列数据上的实验表明,与基于统计检验的方法相比,所提出的概率 GO 术语推断方法的优势尤其体现在注释稀疏的 GO 术语和基因活性存在较大不确定性的情况下。只要存在适当的注释,所提出的方法就可以很容易地应用于推断其他高级别分配,如途径。
可用性:GPL 许可证下的源代码可从作者处获得。
联系方式:peter.sykacek@boku.ac.at。
Bioinformatics. 2012-9-15
Bioinformatics. 2007-8-1
Bioinformatics. 2002-9
BMC Bioinformatics. 2005-7-25
Bioinformatics. 2010-2-21
BMC Bioinformatics. 2007-3-7
Bioinformatics. 2007-11-15
PLoS One. 2016-1-15
Bioinformatics. 2011-1-19
Bioinformatics. 2010-2-21
Bioinformatics. 2007-8-1
Nucleic Acids Res. 2006-7-1
BMC Bioinformatics. 2006-2-23