Scarcelli N, Mariac C, Couvreur T L P, Faye A, Richard D, Sabot F, Berthouly-Salazar C, Vigouroux Y
UMR DIADE, IRD Montpellier, 911 avenue Agropolis, 34394, Montpellier Cedex 5, France.
Département des Sciences Biologiques, Laboratoire de Botanique Systématique et d'Ecologie, Ecole Normale Supérieure, Université de Yaoundé I, BP 047, Yaoundé, Cameroon.
Mol Ecol Resour. 2016 Mar;16(2):434-45. doi: 10.1111/1755-0998.12462. Epub 2015 Sep 20.
Next-generation sequencing allows access to a large quantity of genomic data. In plants, several studies used whole chloroplast genome sequences for inferring phylogeography or phylogeny. Even though the chloroplast is a haploid organelle, NGS plastome data identified a nonnegligible number of intra-individual polymorphic SNPs. Such observations could have several causes such as sequencing errors, the presence of heteroplasmy or transfer of chloroplast sequences in the nuclear and mitochondrial genomes. The occurrence of allelic diversity has practical important impacts on the identification of diversity, the analysis of the chloroplast data and beyond that, significant evolutionary questions. In this study, we show that the observed intra-individual polymorphism of chloroplast sequence data is probably the result of plastid DNA transferred into the mitochondrial and/or the nuclear genomes. We further assess nine different bioinformatics pipelines' error rates for SNP and genotypes calling using SNPs identified in Sanger sequencing. Specific pipelines are adequate to deal with this issue, optimizing both specificity and sensitivity. Our results will allow a proper use of whole chloroplast NGS sequence and will allow a better handling of NGS chloroplast sequence diversity.
新一代测序技术能够获取大量的基因组数据。在植物研究中,已有多项研究利用整个叶绿体基因组序列来推断系统地理学或系统发育关系。尽管叶绿体是一个单倍体细胞器,但二代测序的质体基因组数据仍鉴定出了数量不可忽视的个体内多态性单核苷酸多态性(SNP)。此类观察结果可能有多种原因,如测序错误、异质性的存在或叶绿体序列向核基因组和线粒体基因组的转移。等位基因多样性的出现对多样性的鉴定、叶绿体数据分析以及更广泛的重大进化问题都具有实际重要影响。在本研究中,我们表明观察到的叶绿体序列数据个体内多态性可能是质体DNA转移到线粒体和/或核基因组的结果。我们还使用在桑格测序中鉴定出的SNP评估了九种不同生物信息学流程在SNP和基因型调用方面的错误率。特定的流程足以处理这一问题,同时优化特异性和敏感性。我们的结果将有助于正确使用整个叶绿体二代测序序列,并更好地处理二代测序叶绿体序列多样性。