Armero Alix, Baudouin Luc, Bocs Stéphanie, This Dominique
Montpellier SupAgro, UMR AGAP, Montpellier, France.
CIRAD, UMR AGAP, Montpellier, France.
PLoS One. 2017 Mar 23;12(3):e0173300. doi: 10.1371/journal.pone.0173300. eCollection 2017.
The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).
棕榈科植物起源于热带地区,是全球这些区域生态系统的主要组成部分之一。两种主要的棕榈品种带来了不同的挑战:椰子(Cocos nucifera L.)是热带社区多种产品和服务的来源,而油棕(Elaeis guineensis Jacq)则是油脂市场的主要主角。在本研究中,我们展示了一种工作流程,该流程利用目标物种(椰子)和参考物种(油棕)之间的比较基因组学来改进转录组数据,从而提供一个有助于回答功能或进化问题的蛋白质组。此工作流程减少了转录组数据固有的两个问题——冗余和片段化,同时保留了目标物种的功能表征。我们的方法在拟南芥中以琴叶拟南芥和铜锤草作为参考物种进行了验证。该分析表明我们的策略具有高灵敏度和特异性,相对独立于参考蛋白质组。此工作流程使拟南芥中蛋白质产物的长度增加了13%,通常能够恢复100%的蛋白质序列长度。此外,冗余减少了3倍以上。在椰子中,该方法生成了29366种蛋白质,其中1246种蛋白质来自使用BRANCH软件获得的新重叠群。椰子蛋白质组呈现出与水稻中观察到的功能谱相似,并且有大量与次生代谢相关的代谢途径。通过BRANCH软件发现的新序列在与生物胁迫相关的功能中富集。我们的策略可作为从头转录组组装的补充步骤,以获得目标物种的代表性蛋白质组。当前分析的结果可在网站PalmComparomics(http://palm-comparomics.southgreen.fr/)上获取。