Zhang Chengpu, Xu Ping, Zhu Yunping
Sheng Wu Gong Cheng Xue Bao. 2014 Jul;30(7):1026-35.
With the rapid development of genome sequencing technologies, a large amount of prokaryote genomes have been sequenced in recent years. To further investigate the models and functions of genomes, the algorithms for genome annotations based on the sequence and homology features have been widely implemented to newly sequenced genomes. However, gene annotations only using the genomic information are prone to errors, such as the incorrect N-terminals and pseudogenes. It is even harder to provide reasonable annotating results in the case of the poor genome sequencing results. The transcriptomics based on the technologies such as microarray and RNA-seq and the proteomics based on the MS/MS have been used widely to identify the gene products with high throughput and high sensitivity, providing the powerful tools for the verification and correction of annotated genome. Compared with transcriptomics, proteomics can generate the protein list for the expressed genes in the samples or cells without any confusion of the non-coding RNA, leading the proteogenomics an important basis for the genome annotations in prokaryotes. In this paper, we first described the traditional genome annotation algorithms and pointed out the shortcomings. Then we summarized the advantages of proteomics in the genome annotations and reviewed the progress of proteogenomics in prokaryotes. Finally we discussed the challenges and strategies in the data analyses and potential solutions for the developments of proteogenomics.
随着基因组测序技术的快速发展,近年来大量原核生物基因组已被测序。为了进一步研究基因组的模式和功能,基于序列和同源性特征的基因组注释算法已被广泛应用于新测序的基因组。然而,仅使用基因组信息进行基因注释容易出错,例如不正确的N端和假基因。在基因组测序结果不佳的情况下,更难提供合理的注释结果。基于微阵列和RNA测序等技术的转录组学以及基于串联质谱的蛋白质组学已被广泛用于高通量、高灵敏度地鉴定基因产物,为注释基因组的验证和校正提供了强大工具。与转录组学相比,蛋白质组学可以生成样本或细胞中表达基因的蛋白质列表,而不会受到非编码RNA的干扰,这使得蛋白质基因组学成为原核生物基因组注释的重要依据。在本文中,我们首先描述了传统的基因组注释算法并指出了其缺点。然后我们总结了蛋白质组学在基因组注释中的优势,并回顾了原核生物蛋白质基因组学的进展。最后,我们讨论了数据分析中的挑战和策略以及蛋白质基因组学发展的潜在解决方案。