Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925 Observatory, Cape Town, South Africa.
Infect Genet Evol. 2012 Jul;12(5):922-32. doi: 10.1016/j.meegid.2011.10.027. Epub 2011 Nov 7.
Despite ever-increasing amounts of sequence and functional genomics data, there is still a deficiency of functional annotation for many newly sequenced proteins. For Mycobacterium tuberculosis (MTB), more than half of its genome is still uncharacterized, which hampers the search for new drug targets within the bacterial pathogen and limits our understanding of its pathogenicity. As for many other genomes, the annotations of proteins in the MTB proteome were generally inferred from sequence homology, which is effective but its applicability has limitations. We have carried out large-scale biological data integration to produce an MTB protein functional interaction network. Protein functional relationships were extracted from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and additional functional interactions from microarray, sequence and protein signature data. The confidence level of protein relationships in the additional functional interaction data was evaluated using a dynamic data-driven scoring system. This functional network has been used to predict functions of uncharacterized proteins using Gene Ontology (GO) terms, and the semantic similarity between these terms measured using a state-of-the-art GO similarity metric. To achieve better trade-off between improvement of quality, genomic coverage and scalability, this prediction is done by observing the key principles driving the biological organization of the functional network. This study yields a new functionally characterized MTB strain CDC1551 proteome, consisting of 3804 and 3698 proteins out of 4195 with annotations in terms of the biological process and molecular function ontologies, respectively. These data can contribute to research into the Development of effective anti-tubercular drugs with novel biological mechanisms of action.
尽管序列和功能基因组学数据不断增加,但仍有许多新测序的蛋白质缺乏功能注释。对于结核分枝杆菌(MTB)来说,其基因组的一半以上仍然没有被描述,这阻碍了在细菌病原体中寻找新的药物靶点,并限制了我们对其致病性的理解。对于许多其他基因组来说,MTB 蛋白质组中的蛋白质注释通常是从序列同源性推断出来的,这种方法虽然有效,但适用性有限。我们进行了大规模的生物数据集成,以生成 MTB 蛋白质功能相互作用网络。从 Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) 数据库中提取蛋白质功能关系,并从微阵列、序列和蛋白质特征数据中提取其他功能关系。使用动态数据驱动评分系统评估其他功能关系数据中蛋白质关系的置信度。该功能网络已用于使用基因本体论 (GO) 术语预测未描述蛋白质的功能,并且使用最先进的 GO 相似性度量来测量这些术语之间的语义相似性。为了在提高质量、基因组覆盖和可扩展性之间取得更好的平衡,通过观察驱动功能网络生物组织的关键原则来进行预测。这项研究产生了一个新的具有功能特征的 MTB 菌株 CDC1551 蛋白质组,其中 4195 个蛋白质中有 3804 个和 3698 个分别在生物学过程和分子功能本体论方面具有注释。这些数据有助于研究开发具有新型作用机制的有效抗结核药物。