Artificial Intelligence Center, SRI International, Menlo Park, CA, USA.
Bioinformatics. 2011 Sep 15;27(18):2478-85. doi: 10.1093/bioinformatics/btr428. Epub 2011 Jul 19.
Key problems for computational genomics include discovering novel pathways in genome data, and discovering functional interaction partners for genes to define new members of partially elucidated pathways.
We propose a novel method for the discovery of subsystems from annotated genomes. For each gene pair, a score measuring the likelihood that the two genes belong to a same subsystem is computed using genome context methods. Genes are then grouped based on these scores, and the resulting groups are filtered to keep only high-confidence groups. Since the method is based on genome context analysis, it relies solely on structural annotation of the genomes. The method can be used to discover new pathways, find missing genes from a known pathway, find new protein complexes or other kinds of functional groups and assign function to genes. We tested the accuracy of our method in Escherichia coli K-12. In one configuration of the system, we find that 31.6% of the candidate groups generated by our method match a known pathway or protein complex closely, and that we rediscover 31.2% of all known pathways and protein complexes of at least 4 genes. We believe that a significant proportion of the candidates that do not match any known group in E.coli K-12 corresponds to novel subsystems that may represent promising leads for future laboratory research. We discuss in-depth examples of these findings.
Predicted subsystems are available at http://brg.ai.sri.com/pwy-discovery/journal.html.
Supplementary data are available at Bioinformatics online.
计算基因组学的关键问题包括在基因组数据中发现新的途径,以及发现基因的功能相互作用伙伴,以定义部分阐明途径的新成员。
我们提出了一种从注释基因组中发现子系统的新方法。对于每一对基因,使用基因组上下文方法计算测量这两个基因属于同一子系统的可能性的分数。然后根据这些分数对基因进行分组,并过滤得到的组以仅保留高置信度的组。由于该方法基于基因组上下文分析,因此仅依赖于基因组的结构注释。该方法可用于发现新途径,从已知途径中找到缺失的基因,发现新的蛋白质复合物或其他类型的功能群,并为基因赋予功能。我们在大肠杆菌 K-12 中测试了我们方法的准确性。在系统的一种配置中,我们发现我们的方法生成的 31.6%的候选组与已知途径或蛋白质复合物紧密匹配,并且我们重新发现了至少 4 个基因的所有已知途径和蛋白质复合物的 31.2%。我们相信,在大肠杆菌 K-12 中没有与任何已知组匹配的候选者的很大一部分对应于可能代表未来实验室研究有希望的线索的新子系统。我们深入讨论了这些发现的示例。
预测的子系统可在 http://brg.ai.sri.com/pwy-discovery/journal.html 上获得。
补充数据可在生物信息学在线获得。