McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry, Johns Hopkins University, Baltimore, Maryland 21205, USA.
Genome Res. 2011 Nov;21(11):1872-81. doi: 10.1101/gr.127951.111. Epub 2011 Jul 27.
Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search-specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.
冈比亚按蚊是一种主要的疟蚊媒介,负责疟疾的传播,其基因组序列于 2002 年公布。基因组注释是一项持续的工作,VectorBase 中列出的大约 13000 个冈比亚按蚊基因中,有许多是尚未通过其他方法验证的预测。为了根据其基因组序列鉴定冈比亚按蚊的蛋白质编码基因,我们使用高分辨率傅里叶变换质谱法对前体离子和片段离子进行了深入的蛋白质组学分析。基于肽证据,我们能够支持或纠正 6000 多个基因注释,包括 80 个新的基因结构和约 500 个翻译起始位点。对 105 个选定基因进行了 RT-PCR 和 cDNA 测序的额外验证,成功完成。我们的蛋白质基因组分析确定了 2682 个基因组搜索特异性肽。在注释为基因间、内含子或非翻译区的区域中记录了大量编码蛋白的情况。使用创建的包含潜在剪接位点的数据库,我们还鉴定了 35 个新的剪接接头。这是首次使用高精度质谱数据注释冈比亚按蚊基因组的报告,作为基因组注释的补充技术。