Buza Teresia J, McCarthy Fiona M, Burgess Shane C
Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, MS 39762, USA.
BMC Genomics. 2007 Nov 19;8:425. doi: 10.1186/1471-2164-8-425.
The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned.
We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology), we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO) functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines.
We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and inform gene prediction algorithms.
鸡基因组之所以被测序,是因为它作为非哺乳动物脊椎动物的系统发育地位,作为生物医学模型尤其是用于研究胚胎学和发育的用途,作为人类疾病病原体来源的作用,以及作为动物源性食物蛋白主要来源的重要性。然而,基因组序列数据本身价值有限;一般来说,它不等同于对生物学功能的理解。拥有基因组序列的好处在于它为功能基因组学提供了基础。然而,目前可用的序列数据在结构和功能注释方面很差,许多基因没有被赋予标准命名法。
我们分析了八个鸡组织,并通过为7809个通过计算预测的蛋白质的体内表达提供实验支持,改进了鸡基因组的结构注释,其中包括30个仅在电子层面上预测或在人类中为假设翻译产物的鸡蛋白。为了改进功能注释(基于基因本体论),我们将这些鉴定出的蛋白质映射到它们的人类和小鼠直系同源物上,并利用这种直系同源关系将基因本体论(GO)功能注释转移到鸡蛋白上。我们产生的8213个基于直系同源关系的GO注释代表了目前可用的鸡GO注释增加了8%。还根据当前的鸡命名指南为直系同源鸡产物赋予了标准化命名法。
我们证明了高通量表达蛋白质组学对于新测序的真核生物基因组快速进行实验性结构注释的实用性。这些经过实验支持的预测蛋白质通过赋予标准化命名法和功能注释得到了进一步注释。这种方法广泛适用于各种物种。此外,来自一个基因组的信息可用于改进其他基因组的注释并为基因预测算法提供信息。