Zhu Xun, Xie Shangbo, Armengaud Jean, Xie Wen, Guo Zhaojiang, Kang Shi, Wu Qingjun, Wang Shaoli, Xia Jixing, He Rongjun, Zhang Youjun
From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China;
§BGI-Shenzhen, Shenzhen, 518083 China;
Mol Cell Proteomics. 2016 Jun;15(6):1791-807. doi: 10.1074/mcp.M115.050989. Epub 2016 Feb 22.
The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest.
小菜蛾(Plutella xylostella (L.))是全球范围内危害芸苔属及其他十字花科作物的主要害虫。其幼虫中肠是一个动态组织,参与多种毒理学和生理学过程。小菜蛾基因组草图最近已公布,但由于该生物类群的序列覆盖度较低以及这些昆虫外显子/内含子剪接规则的描述不足,其注释工作仍具有挑战性。通过将串联质谱计算分配到基因组序列信息进行肽测序,为证实或反驳蛋白质预测提供了一种独立于实验的方法,这一概念被称为蛋白质基因组学。在本研究中,我们基于鸟枪法HPLC-ESI-MS/MS数据,通过多算法流程对小菜蛾幼虫中肠的基因组注释进行补充,开展了深入的蛋白质基因组学分析。针对预测的小菜蛾蛋白质序列和全基因组六框架翻译数据库,共检索了876,341个串联质谱。基于包含2694个新的基因组搜索特异性肽的数据集,我们发现了439个新的蛋白质编码基因,并修正了128个现有的基因模型。为获得最准确的数据以推动进一步的昆虫基因组注释,超过一半的新蛋白质编码基因(即439个中的235个)在相应转录本进行RT-PCR扩增和测序后得到了进一步验证。此外,我们验证了53个新的可变剪接。最终,共鉴定出6764种蛋白质,这是对非模式动物最全面的蛋白质基因组学研究之一。作为小菜蛾首个组织特异性蛋白质基因组学分析,本研究为旨在破译抗性分子机制和控制这种害虫的高通量蛋白质组学和功能基因组学方法提供了基础依据。