Ramakrishnan Gayatri, Ochoa-Montaño Bernardo, Raghavender Upadhyayula S, Mudgal Richa, Joshi Adwait G, Chandra Nagasuma R, Sowdhamini Ramanathan, Blundell Tom L, Srinivasan Narayanaswamy
Indian Institute of Science Mathematics Initiative, Indian Institute of Science, Bangalore 560012, India; Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.
Tuberculosis (Edinb). 2015 Jan;95(1):14-25. doi: 10.1016/j.tube.2014.10.009. Epub 2014 Nov 6.
The availability of the genome sequence of Mycobacterium tuberculosis H37Rv has encouraged determination of large numbers of protein structures and detailed definition of the biological information encoded therein; yet, the functions of many proteins in M. tuberculosis remain unknown. The emergence of multidrug resistant strains makes it a priority to exploit recent advances in homology recognition and structure prediction to re-analyse its gene products. Here we report the structural and functional characterization of gene products encoded in the M. tuberculosis genome, with the help of sensitive profile-based remote homology search and fold recognition algorithms resulting in an enhanced annotation of the proteome where 95% of the M. tuberculosis proteins were identified wholly or partly with information on structure or function. New information includes association of 244 proteins with 205 domain families and a separate set of new association of folds to 64 proteins. Extending structural information across uncharacterized protein families represented in the M. tuberculosis proteome, by determining superfamily relationships between families of known and unknown structures, has contributed to an enhancement in the knowledge of structural content. In retrospect, such superfamily relationships have facilitated recognition of probable structure and/or function for several uncharacterized protein families, eventually aiding recognition of probable functions for homologous proteins corresponding to such families. Gene products unique to mycobacteria for which no functions could be identified are 183. Of these 18 were determined to be M. tuberculosis specific. Such pathogen-specific proteins are speculated to harbour virulence factors required for pathogenesis. A re-annotated proteome of M. tuberculosis, with greater completeness of annotated proteins and domain assigned regions, provides a valuable basis for experimental endeavours designed to obtain a better understanding of pathogenesis and to accelerate the process of drug target discovery.
结核分枝杆菌H37Rv基因组序列的可得性促使人们测定了大量蛋白质结构,并详细定义了其中编码的生物学信息;然而,结核分枝杆菌中许多蛋白质的功能仍然未知。多重耐药菌株的出现使得利用同源识别和结构预测的最新进展重新分析其基因产物成为当务之急。在此,我们报告了结核分枝杆菌基因组中编码的基因产物的结构和功能特征,借助基于敏感谱的远程同源搜索和折叠识别算法,增强了蛋白质组注释,其中95%的结核分枝杆菌蛋白质全部或部分被鉴定出具有结构或功能信息。新信息包括244种蛋白质与205个结构域家族的关联,以及另一组64种蛋白质与折叠的新关联。通过确定已知和未知结构家族之间的超家族关系,将结构信息扩展到结核分枝杆菌蛋白质组中未表征的蛋白质家族,有助于增强对结构内容的认识。回顾过去,这种超家族关系有助于识别几个未表征蛋白质家族可能的结构和/或功能,最终有助于识别与此类家族相对应的同源蛋白质的可能功能。无法确定功能的分枝杆菌特有基因产物有183种。其中18种被确定为结核分枝杆菌特有的。推测此类病原体特异性蛋白质含有发病机制所需的毒力因子。一个注释更完整、注释蛋白质和结构域分配区域更完整的结核分枝杆菌蛋白质组注释,为旨在更好地理解发病机制和加速药物靶点发现过程的实验努力提供了有价值的基础。