European Molecular Biology Laboratory, Heidelberg, Germany.
PLoS One. 2012;7(4):e34302. doi: 10.1371/journal.pone.0034302. Epub 2012 Apr 2.
The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as 'hypothetical' implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism.
结核分枝杆菌(H37Rv)的基因组包含 4019 个编码蛋白质的基因,其中超过千个被归类为“假设”,这意味着到目前为止,甚至还没有发现这些基因的微弱功能关联。我们在这里预测了这个庞大的假设 ORF 组的一半的可靠功能指示:基于同源性可以注释 497 个基因,并且通过整合基因组上下文分析和文献挖掘,可以将另外 125 个基因与相互作用的蛋白质联系起来。这些分配包括新鉴定的相互作用蛋白簇、与已知途径相关的假设基因和潜在的与疾病相关的靶标。总的来说,我们将至少具有一些功能注释的蛋白质组部分提高到 88%,这应该大大增强针对这种医学上重要的生物体的大规模实验的解释。