Klie Sebastian, Mutwil Marek, Persson Staffan, Nikoloski Zoran
Max-Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
Mol Biosyst. 2012 Sep;8(9):2233-41. doi: 10.1039/c2mb25089f. Epub 2012 Jun 29.
Inference of accurate gene annotations requires integration of existing biological knowledge, structured in a form of ontology, with data from transcriptomics high-throughput technologies. This undertaking requires developing algorithms that integrate genome-scale data, even for model organisms. Gene relevance networks have emerged as a powerful representative of the structure of the data. Such networks can be used for intra-species transfer of gene annotations following the guilt-by-association principle. An analogous principle can serve as a basis for inter-species transfer of gene annotations by comparing well-defined subnetworks. In this review, we compare and contrast the concepts of relevance and proximity networks and briefly review the concept of semantic similarity. We then provide a detailed account of quantitative guilt-by-association inference in the setting of genome-scale relevance networks. Moreover, we systematically survey the existing network-based approaches for automated gene function annotation and categorize them under one umbrella in terms of employed methodology. Furthermore, we discuss suitable data selection strategies required for deriving meaningful and unbiased genome-scale networks from large transcriptomics compendia. Lastly, by simulating gene function prediction with a classical network-based algorithm, we show how the number of genes of unknown function influences prediction within a species and pinpoint the need and the requirements for inter-species knowledge transfer.
准确的基因注释推导需要将以本体形式构建的现有生物学知识与来自转录组学高通量技术的数据相结合。即使对于模式生物,这项工作也需要开发整合基因组规模数据的算法。基因相关网络已成为数据结构的有力代表。此类网络可根据关联有罪原则用于基因注释的种内转移。通过比较定义明确的子网,类似的原则可作为基因注释种间转移的基础。在本综述中,我们比较并对比了相关网络和邻近网络的概念,并简要回顾了语义相似性的概念。然后,我们详细阐述了基因组规模相关网络环境下的定量关联有罪推断。此外,我们系统地调查了现有的基于网络的自动基因功能注释方法,并根据所采用的方法将它们统一分类。此外,我们讨论了从大型转录组学数据集中推导有意义且无偏差的基因组规模网络所需的合适数据选择策略。最后,通过使用经典的基于网络的算法模拟基因功能预测,我们展示了未知功能基因的数量如何影响物种内的预测,并指出了种间知识转移的必要性和要求。