Garcia Benjamin J, Datta Gargi, Davidson Rebecca M, Strong Michael
Computational Bioscience Program, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA.
Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA.
BMC Genomics. 2015 Dec 24;16:1102. doi: 10.1186/s12864-015-2311-9.
Central to most omic scale experiments is the interpretation and examination of resulting gene lists corresponding to differentially expressed, regulated, or observed gene or protein sets. Complicating interpretation is a lack of functional annotation assigned to a large percentage of many microbial genomes. This is particularly noticeable in mycobacterial genomes, which are significantly divergent from many of the microbial model species used for gene and protein functional characterization, but which are extremely important clinically. Mycobacterial species, ranging from M. tuberculosis to M. abscessus, are responsible for deadly infectious diseases that kill over 1.5 million people each year across the world. A better understanding of the coding capacity of mycobacterial genomes is therefore necessary to shed increasing light on putative mechanisms of virulence, pathogenesis, and functional adaptations.
Here we describe the improved functional annotation coverage of 11 important mycobacterial genomes, many involved in human diseases including tuberculosis, leprosy, and nontuberculous mycobacterial (NTM) infections. Of the 11 mycobacterial genomes, we provide 9899 new functional annotations, compared to NCBI and TBDB annotations, for genes previously characterized as genes of unknown function, hypothetical, and hypothetical conserved proteins. Functional annotations are available at our newly developed web resource MycoBASE (Mycobacterial Annotation Server) at strong.ucdenver.edu/mycobase.
Improved annotations allow for better understanding and interpretation of genomic and transcriptomic experiments, including analyzing the functional implications of insertions, deletions, and mutations, inferring the function of understudied genes, and determining functional changes resulting from differential expression studies. MycoBASE provides a valuable resource for mycobacterial researchers, through improved and searchable functional annotations and functional enrichment strategies. MycoBASE will be continually supported and updated to include new genomes, enabling a powerful resource to aid the quest to better understand these important pathogenic and environmental species.
大多数组学规模实验的核心是对与差异表达、调控或观察到的基因或蛋白质集相对应的结果基因列表进行解释和检查。许多微生物基因组中有很大比例缺乏功能注释,这使得解释变得复杂。这在分枝杆菌基因组中尤为明显,分枝杆菌基因组与用于基因和蛋白质功能表征的许多微生物模式物种有很大差异,但在临床上却极为重要。从结核分枝杆菌到脓肿分枝杆菌的分枝杆菌物种,是导致致命传染病的罪魁祸首,每年在全球造成超过150万人死亡。因此,有必要更好地了解分枝杆菌基因组的编码能力,以便进一步揭示毒力、发病机制和功能适应性的潜在机制。
在此,我们描述了11个重要分枝杆菌基因组功能注释覆盖率的提高,其中许多基因组与包括结核病、麻风病和非结核分枝杆菌(NTM)感染在内的人类疾病有关。在这11个分枝杆菌基因组中,与美国国立医学图书馆(NCBI)和结核分枝杆菌数据库(TBDB)的注释相比,我们为先前被表征为功能未知基因、假设基因和假设保守蛋白的基因提供了9899个新的功能注释。功能注释可在我们新开发的网络资源MycoBASE(分枝杆菌注释服务器)上获取,网址为strong.ucdenver.edu/mycobase。
改进的注释有助于更好地理解和解释基因组和转录组实验,包括分析插入、缺失和突变的功能影响,推断研究不足的基因的功能,以及确定差异表达研究导致的功能变化。MycoBASE通过改进的、可搜索的功能注释和功能富集策略,为分枝杆菌研究人员提供了宝贵的资源。MycoBASE将持续得到支持和更新,以纳入新的基因组,从而形成一个强大的资源,有助于更好地了解这些重要的致病和环境物种。