College of Animal Sciences , Zhejiang University , Hangzhou , P. R. China.
J Proteome Res. 2019 Aug 2;18(8):3009-3019. doi: 10.1021/acs.jproteome.8b00965. Epub 2019 Jul 2.
The silkworm genome has been deeply sequenced and assembled, but accurate genome annotation, which is important for modern biological research, remains far from complete. To improve silkworm genome annotation, we carried out a proteogenomics analysis using 9.8 million mass spectra collected from different tissues and developmental stages of the silkworm. The results confirmed the translational products of 4307 existing gene models and identified 1701 novel genome search-specific peptides (GSSPs). Using these GSSPs, 74 novel gene-coding sequences were identified, and 121 existing gene models were corrected. We also identified 1182 novel junction peptides based on an exon-skipping database that resulted in the identification of 973 alternative splicing sites. Furthermore, we performed RNA-seq analysis to improve silkworm genome annotation at the transcriptional level. A total of 1704 new transcripts and 1136 new exons were identified, 2581 untranslated regions (UTRs) were revised, and 1301 alternative splicing (AS) genes were identified. The transcriptomics results were integrated with the proteomics data to further complement and verify the new annotations. In addition, 14 incorrect genes and 10 skipped exons were verified using the two analysis methods. Altogether, we identified 1838 new transcripts and 1593 AS genes and revised 5074 existing genes using proteogenomics and transcriptome analyses. Data are available via ProteomeXchange with identifier PXD009672. The large-scale proteogenomics and transcriptome analyses in this study will greatly improve silkworm genome annotation and contribute to future studies.
家蚕基因组已被深度测序和组装,但对于现代生物学研究至关重要的准确基因组注释仍远未完成。为了改进家蚕基因组注释,我们使用从家蚕不同组织和发育阶段收集的 980 万个质谱进行了蛋白质基因组学分析。结果证实了 4307 个现有基因模型的翻译产物,并鉴定了 1701 个新的基因组搜索特异性肽(GSSP)。利用这些 GSSP,鉴定了 74 个新的基因编码序列,并纠正了 121 个现有基因模型。我们还根据外显子跳过数据库鉴定了 1182 个新的连接肽,从而鉴定了 973 个替代剪接位点。此外,我们进行了 RNA-seq 分析,以提高转录水平的家蚕基因组注释。总共鉴定了 1704 个新转录本和 1136 个新外显子,修订了 2581 个非翻译区(UTR),并鉴定了 1301 个可变剪接(AS)基因。转录组学结果与蛋白质组学数据相结合,进一步补充和验证了新注释。此外,使用这两种分析方法验证了 14 个错误基因和 10 个跳过外显子。总共使用蛋白质基因组学和转录组分析鉴定了 1838 个新转录本和 1593 个 AS 基因,并修订了 5074 个现有基因。数据可通过 ProteomeXchange 以标识符 PXD009672 获得。本研究中的大规模蛋白质基因组学和转录组分析将极大地改进家蚕基因组注释,并为未来的研究做出贡献。