National Center for Molecular Characterization of GMOs and State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
BMC Genomics. 2011 Jul 12;12:359. doi: 10.1186/1471-2164-12-359.
One of the major tasks of the post-genomic era is "reading" genomic sequences in order to extract all the biological information contained in them. Although a wide variety of techniques is used to solve the gene finding problem and a number of prokaryotic gene-finding software are available, gene recognition in bacteria is far from being always straightforward.
This study reported a thorough search for new CDS in the two published Xcc genomes. In the first, putative CDSs encoded in the two genomes were re-predicted using three gene finders, resulting in the identification of 2850 putative new CDSs. In the second, similarity searching was conducted and 278 CDSs were found to have homologs in other bacterial species. In the third, oligonucleotide microarray and RT-PCR analysis identified 147 CDSs with detectable mRNA transcripts. Finally, in-frame deletion and subsequent phenotype analysis of confirmed that Xcc_CDS002 encoding a novel SIR2-like domain protein is involved in virulence and Xcc_CDS1553 encoding a ArsR family transcription factor is involved in arsenate resistance.
Despite sophisticated approaches available for genome annotation, many cellular transcripts have remained unidentified so far in Xcc genomes. Through a combined strategy involving bioinformatic, postgenomic and genetic approaches, a reliable list of 306 new CDSs was identified and a more thorough understanding of some cellular processes was gained.
后基因组时代的主要任务之一是“读取”基因组序列,以提取其中包含的所有生物信息。尽管有多种技术用于解决基因发现问题,并且有许多原核基因发现软件可用,但细菌中的基因识别远非总是直截了当。
本研究对已发表的两个 Xcc 基因组中的新 CDS 进行了全面搜索。在第一个基因组中,使用三个基因预测器重新预测了两个基因组中编码的推定 CDS,从而鉴定出 2850 个推定的新 CDS。在第二个基因组中,进行了相似性搜索,发现 278 个 CDS 在其他细菌物种中有同源物。在第三个基因组中,寡核苷酸微阵列和 RT-PCR 分析鉴定出 147 个具有可检测 mRNA 转录物的 CDS。最后,通过框内缺失和随后的表型分析证实,编码新型 SIR2 样结构域蛋白的 Xcc_CDS002 参与了毒力,而编码 ArsR 家族转录因子的 Xcc_CDS1553 参与了砷酸盐抗性。
尽管有复杂的基因组注释方法,但到目前为止,Xcc 基因组中仍有许多细胞转录本未被识别。通过涉及生物信息学、后基因组学和遗传学方法的综合策略,确定了可靠的 306 个新 CDS 列表,并对一些细胞过程有了更深入的了解。