处理RADseq数据中的旁系同源性：在L.中进行计算机模拟检测和单核苷酸多态性验证

Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in L.

作者信息

Verdu Cindy F, Guichoux Erwan, Quevauvillers Samuel, De Thier Olivier, Laizet Yec'han, Delcamp Adline, Gévaudant Frédéric, Monty Arnaud, Porté Annabel J, Lejeune Philippe, Lassois Ludivine, Mariette Stéphanie

机构信息

Forest Management Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium.

Biogeco INRA University of Bordeaux Cestas France.

出版信息

Ecol Evol. 2016 Sep 22;6(20):7323-7333. doi: 10.1002/ece3.2466. eCollection 2016 Oct.

DOI:10.1002/ece3.2466

PMID:28725400

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5513258/

Abstract

The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.

摘要

RADseq技术使研究人员能够在对基因组几乎没有或完全没有先验信息的情况下，高效地在多个个体中开发出数千个多态性位点。然而，关于该技术固有的偏差仍存在许多问题。值得注意的是，由旁系同源性引起的序列错配可能会影响单核苷酸多态性（SNP）标记的开发以及遗传多样性的估计。我们评估了假定的旁系同源基因座对非模式树种L的RADseq数据集中SNP开发过程中遗传多样性估计的影响。我们对9个基因型进行了测序，并分析了假定的旁系同源RAD基因座的频率与覆盖深度以及基因座之间允许的错配阈值的函数关系。在数量非常可变的基因座中检测到假定的旁系同源性，从1%到超过20%不等，覆盖深度对结果有重大影响。假定的旁系同源性人为地增加了观察到的多态性程度以及由此产生的多样性估计。覆盖深度的选择也影响了多样性估计和SNP验证：低阈值降低了检测次要等位基因的机会，而高阈值增加了等位基因缺失。我们测试的低阈值（4×）的SNP验证比高阈值（18×）更好。使用这里开发的策略，我们能够通过个体基因分型验证超过80%的测试SNP，从而得到一组易于使用的330个SNP，适用于群体遗传学应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d13/5513258/14a9e97d32dc/ECE3-6-7323-g001.jpg

相似文献

Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in L.

Ecol Evol. 2016 Sep 22;6(20):7323-7333. doi: 10.1002/ece3.2466. eCollection 2016 Oct.

Development and Evaluation of a Novel Set of EST-SSR Markers Based on Transcriptome Sequences of Black Locust (Robinia pseudoacacia L.).

Genes (Basel). 2017 Jul 7;8(7):177. doi: 10.3390/genes8070177.

A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits.

Mol Ecol Resour. 2021 May;21(4):1085-1097. doi: 10.1111/1755-0998.13324. Epub 2021 Feb 8.

Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal.

BMC Genomics. 2019 Jan 22;20(1):72. doi: 10.1186/s12864-019-5440-8.

Evaluation of the Genetic Diversity and Differentiation of Black Locust ( L.) Based on Genomic and Expressed Sequence Tag-Simple Sequence Repeats.

Int J Mol Sci. 2018 Aug 23;19(9):2492. doi: 10.3390/ijms19092492.

Simple SNP-based minimal marker genotyping for Humulus lupulus L. identification and variety validation.

BMC Res Notes. 2015 Oct 6;8:542. doi: 10.1186/s13104-015-1492-2.

Fast and cost-effective single nucleotide polymorphism (SNP) detection in the absence of a reference genome using semideep next-generation Random Amplicon Sequencing (RAMseq).

Mol Ecol Resour. 2018 Jan;18(1):107-117. doi: 10.1111/1755-0998.12717. Epub 2017 Oct 9.

High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra.

BMC Genomics. 2017 May 30;18(1):417. doi: 10.1186/s12864-017-3765-8.

Physical mapping of QTL for tuber yield, starch content and starch yield in tetraploid potato (Solanum tuberosum L.) by means of genome wide genotyping by sequencing and the 8.3 K SolCAP SNP array.

BMC Genomics. 2017 Aug 22;18(1):642. doi: 10.1186/s12864-017-3979-9.

Assessment of genetic diversity and variation of Robinia pseudoacacia seeds induced by short-term spaceflight based on two molecular marker systems and morphological traits.

Genet Mol Res. 2012 Dec 17;11(4):4268-77. doi: 10.4238/2012.December.17.2.

引用本文的文献

Conservation Genomics for Threatened New Zealand (Gentianaceae) and Implications for Vulnerable Limestone Ecosystems.

Ecol Evol. 2025 Jun 17;15(6):e71596. doi: 10.1002/ece3.71596. eCollection 2025 Jun.

Strong Environmental and Genome-Wide Population Differentiation Underpins Adaptation and High Genomic Vulnerability in the Dominant Australian Kelp ().

Ecol Evol. 2025 May 12;15(5):e71158. doi: 10.1002/ece3.71158. eCollection 2025 May.

Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication.

Genome Biol Evol. 2023 Dec 1;15(12). doi: 10.1093/gbe/evad229.

A population-level statistic for assessing Mendelian behavior of genotyping-by-sequencing data from highly duplicated genomes.

BMC Bioinformatics. 2022 Mar 22;23(1):101. doi: 10.1186/s12859-022-04635-9.

Synonymous SNPs of viral genes facilitate virus to escape host antiviral RNAi immunity.

RNA Biol. 2019 Dec;16(12):1697-1710. doi: 10.1080/15476286.2019.1656026. Epub 2019 Aug 30.

A few north Appalachian populations are the source of European black locust.

Ecol Evol. 2019 Feb 16;9(5):2398-2414. doi: 10.1002/ece3.4776. eCollection 2019 Mar.

Genomic Selection in Aquaculture: Application, Limitations and Opportunities With Special Reference to Marine Shrimp and Pearl Oysters.

Front Genet. 2019 Jan 23;9:693. doi: 10.3389/fgene.2018.00693. eCollection 2018.

Investigation of Chinese Wolfberry (Lycium spp.) Germplasm by Restriction Site-Associated DNA Sequencing (RAD-seq).

Biochem Genet. 2018 Dec;56(6):575-585. doi: 10.1007/s10528-018-9861-x. Epub 2018 Jun 6.

MiSNPDb: a web-based genomic resources of tropical ecology fruit mango (Mangifera indica L.) for phylogeography and varietal differentiation.

Sci Rep. 2017 Nov 2;7(1):14968. doi: 10.1038/s41598-017-14998-2.

本文引用的文献

Harnessing the power of RADseq for ecological and evolutionary genomics.

Nat Rev Genet. 2016 Feb;17(2):81-92. doi: 10.1038/nrg.2015.28. Epub 2016 Jan 5.

A dense SNP genetic map constructed using restriction site-associated DNA sequencing enables detection of QTLs controlling apple fruit quality.

BMC Genomics. 2015 Oct 5;16:747. doi: 10.1186/s12864-015-1946-x.

Genome-wide association links candidate genes to resistance to Plum Pox Virus in apricot (Prunus armeniaca).

New Phytol. 2016 Jan;209(2):773-84. doi: 10.1111/nph.13627. Epub 2015 Sep 10.

FAST: FAST Analysis of Sequences Toolbox.

Front Genet. 2015 May 19;6:172. doi: 10.3389/fgene.2015.00172. eCollection 2015.

Linkage mapping with paralogs exposes regions of residual tetrasomic inheritance in chum salmon (Oncorhynchus keta).

Mol Ecol Resour. 2016 Jan;16(1):17-28. doi: 10.1111/1755-0998.12394. Epub 2015 Mar 11.

Population genomics reveals seahorses (Hippocampus erectus) of the western mid-Atlantic coast to be residents rather than vagrants.

PLoS One. 2015 Jan 28;10(1):e0116219. doi: 10.1371/journal.pone.0116219. eCollection 2015.

Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes.

Mol Biol Evol. 2015 Jan;32(1):193-210. doi: 10.1093/molbev/msu296. Epub 2014 Oct 27.

Development and validation of a 20K single nucleotide polymorphism (SNP) whole genome genotyping array for apple (Malus × domestica Borkh).

PLoS One. 2014 Oct 10;9(10):e110377. doi: 10.1371/journal.pone.0110377. eCollection 2014.

Genomic exploration and molecular marker development in a large and complex conifer genome using RADseq and mRNAseq.

Mol Ecol Resour. 2015 May;15(3):601-12. doi: 10.1111/1755-0998.12329. Epub 2014 Oct 3.

Genomics of the divergence continuum in an African plant biodiversity hotspot, I: drivers of population divergence in Restio capensis (Restionaceae).

Mol Ecol. 2014 Sep;23(17):4373-86. doi: 10.1111/mec.12870. Epub 2014 Aug 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

处理RADseq数据中的旁系同源性：在L.中进行计算机模拟检测和单核苷酸多态性验证

Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in L.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献