Suppr超能文献

处理RADseq数据中的旁系同源性:在L.中进行计算机模拟检测和单核苷酸多态性验证

Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in L.

作者信息

Verdu Cindy F, Guichoux Erwan, Quevauvillers Samuel, De Thier Olivier, Laizet Yec'han, Delcamp Adline, Gévaudant Frédéric, Monty Arnaud, Porté Annabel J, Lejeune Philippe, Lassois Ludivine, Mariette Stéphanie

机构信息

Forest Management Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium.

Biogeco INRA University of Bordeaux Cestas France.

出版信息

Ecol Evol. 2016 Sep 22;6(20):7323-7333. doi: 10.1002/ece3.2466. eCollection 2016 Oct.

Abstract

The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.

摘要

RADseq技术使研究人员能够在对基因组几乎没有或完全没有先验信息的情况下,高效地在多个个体中开发出数千个多态性位点。然而,关于该技术固有的偏差仍存在许多问题。值得注意的是,由旁系同源性引起的序列错配可能会影响单核苷酸多态性(SNP)标记的开发以及遗传多样性的估计。我们评估了假定的旁系同源基因座对非模式树种L的RADseq数据集中SNP开发过程中遗传多样性估计的影响。我们对9个基因型进行了测序,并分析了假定的旁系同源RAD基因座的频率与覆盖深度以及基因座之间允许的错配阈值的函数关系。在数量非常可变的基因座中检测到假定的旁系同源性,从1%到超过20%不等,覆盖深度对结果有重大影响。假定的旁系同源性人为地增加了观察到的多态性程度以及由此产生的多样性估计。覆盖深度的选择也影响了多样性估计和SNP验证:低阈值降低了检测次要等位基因的机会,而高阈值增加了等位基因缺失。我们测试的低阈值(4×)的SNP验证比高阈值(18×)更好。使用这里开发的策略,我们能够通过个体基因分型验证超过80%的测试SNP,从而得到一组易于使用的330个SNP,适用于群体遗传学应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d13/5513258/14a9e97d32dc/ECE3-6-7323-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验