对 RAD 位点进行单体型分析：一种有效过滤旁系同源基因和考虑物理连锁的方法。

Haplotyping RAD loci: an efficient method to filter paralogs and account for physical linkage.

机构信息

Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, 6300 Ocean Drive, Corpus Christi, TX, 78412, USA.

Marine Science Center, Northeastern University, 430 Nahant RD, Nahant, MA, 01908, USA.

出版信息

Mol Ecol Resour. 2017 Sep;17(5):955-965. doi: 10.1111/1755-0998.12647. Epub 2017 Feb 9.

DOI:10.1111/1755-0998.12647

PMID:28042915

Abstract

Next-generation sequencing of reduced-representation genomic libraries provides a powerful methodology for genotyping thousands of single-nucleotide polymorphisms (SNPs) among individuals of nonmodel species. Utilizing genotype data in the absence of a reference genome, however, presents a number of challenges. One major challenge is the trade-off between splitting alleles at a single locus into separate clusters (loci), creating inflated homozygosity, and lumping multiple loci into a single contig (locus), creating artefacts and inflated heterozygosity. This issue has been addressed primarily through the use of similarity cut-offs in sequence clustering. Here, two commonly employed, postclustering filtering methods (read depth and excess heterozygosity) used to identify incorrectly assembled loci are compared with haplotyping, another postclustering filtering approach. Simulated and empirical data sets were used to demonstrate that each of the three methods separately identified incorrectly assembled loci; more optimal results were achieved when the three methods were applied in combination. The results confirmed that including incorrectly assembled loci in population-genetic data sets inflates estimates of heterozygosity and deflates estimates of population divergence. Additionally, at low levels of population divergence, physical linkage between SNPs within a locus created artificial clustering in analyses that assume markers are independent. Haplotyping SNPs within a locus effectively neutralized the physical linkage issue without having to thin data to a single SNP per locus. We introduce a Perl script that haplotypes polymorphisms, using data from single or paired-end reads, and identifies potentially problematic loci.

摘要

下一代简化基因组文库测序为非模式物种个体中数千个单核苷酸多态性（SNP）的基因分型提供了一种强大的方法。然而，在缺乏参考基因组的情况下利用基因型数据存在许多挑战。一个主要的挑战是在单个基因座处将等位基因分裂成单独的聚类（基因座），从而产生膨胀的纯合性，或者将多个基因座合并到单个连续体（基因座）中，从而产生假象和膨胀的杂合性。这个问题主要通过在序列聚类中使用相似性截止值来解决。在这里，比较了两种常用的聚类后过滤方法（读深度和过剩杂合性），用于识别错误组装的基因座，另一种聚类后过滤方法是单倍型分析。使用模拟和经验数据集来证明这三种方法单独地都可以识别错误组装的基因座；当三种方法联合使用时，会得到更优的结果。结果证实，将错误组装的基因座纳入种群遗传数据集会增加杂合度的估计值并降低种群分歧的估计值。此外，在种群分歧程度较低的情况下，基因座内 SNP 之间的物理连锁在假定标记是独立的分析中会产生人为聚类。对基因座内的 SNP 进行单倍型分析可以有效地解决物理连锁问题，而无需将数据缩减到每个基因座一个 SNP。我们引入了一个 Perl 脚本，可以使用单端或双端读取的数据进行多态性单倍型分析，并识别潜在的有问题的基因座。

相似文献

Haplotyping RAD loci: an efficient method to filter paralogs and account for physical linkage.对 RAD 位点进行单体型分析：一种有效过滤旁系同源基因和考虑物理连锁的方法。

Mol Ecol Resour. 2017 Sep;17(5):955-965. doi: 10.1111/1755-0998.12647. Epub 2017 Feb 9.

Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing.基于定制扩增子测序的低成本单核苷酸多态性（SNP）基因分型方法：测序数千样本基因分型法（GT-seq）

Mol Ecol Resour. 2015 Jul;15(4):855-67. doi: 10.1111/1755-0998.12357. Epub 2014 Dec 25.

SNP discovery in nonmodel organisms: strand bias and base-substitution errors reduce conversion rates.非模式生物中的单核苷酸多态性（SNP）发现：链偏向和碱基替换错误会降低转化率。

Mol Ecol Resour. 2015 Jul;15(4):723-36. doi: 10.1111/1755-0998.12343. Epub 2014 Nov 23.

Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation.牛群中的低深度测序基因分型（GBS）：最大化高质量基因型选择和归因准确性的策略。

BMC Genet. 2017 Apr 5;18(1):32. doi: 10.1186/s12863-017-0501-y.

Finding the right coverage: the impact of coverage and sequence quality on single nucleotide polymorphism genotyping error rates.寻找合适的覆盖度：覆盖度和序列质量对单核苷酸多态性基因分型错误率的影响

Mol Ecol Resour. 2016 Jul;16(4):966-78. doi: 10.1111/1755-0998.12519. Epub 2016 Mar 25.

Fast and cost-effective single nucleotide polymorphism (SNP) detection in the absence of a reference genome using semideep next-generation Random Amplicon Sequencing (RAMseq).利用半深下一代随机扩增多态性测序 (RAMseq) 在没有参考基因组的情况下快速且经济有效地检测单核苷酸多态性 (SNP)。

Mol Ecol Resour. 2018 Jan;18(1):107-117. doi: 10.1111/1755-0998.12717. Epub 2017 Oct 9.

Double-digest RAD sequencing using Ion Proton semiconductor platform (ddRADseq-ion) with nonmodel organisms.使用离子质子半导体平台（ddRADseq-ion）对非模式生物进行双酶切RAD测序。

Mol Ecol Resour. 2015 Nov;15(6):1316-29. doi: 10.1111/1755-0998.12406. Epub 2015 Apr 6.

Genomic survey sequencing for development and validation of single-locus SSR markers in peanut (Arachis hypogaea L.).用于花生（Arachis hypogaea L.）单基因座SSR标记开发与验证的基因组调查测序

BMC Genomics. 2016 Jun 1;17:420. doi: 10.1186/s12864-016-2743-x.

A resource of genome-wide single-nucleotide polymorphisms generated by RAD tag sequencing in the critically endangered European eel.通过 RAD 标签测序在极度濒危的欧洲鳗鲡中生成的全基因组单核苷酸多态性资源。

Mol Ecol Resour. 2013 Jul;13(4):706-14. doi: 10.1111/1755-0998.12117. Epub 2013 May 9.

An ultra-high density genetic linkage map of perennial ryegrass (Lolium perenne) using genotyping by sequencing (GBS) based on a reference shotgun genome assembly.基于参考鸟枪法基因组组装，利用简化基因组测序（GBS）构建的多年生黑麦草（Lolium perenne）超高密度遗传连锁图谱。

Ann Bot. 2016 Jul;118(1):71-87. doi: 10.1093/aob/mcw081. Epub 2016 Jun 6.

引用本文的文献

Genetic diversity and demographic history of the largest remaining migratory population of brindled wildebeest (Connochaetes taurinus taurinus) in southern Africa.非洲南部现存最大的斑纹角马（白须牛羚指名亚种）迁徙种群的遗传多样性和种群历史。

PLoS One. 2025 Apr 24;20(4):e0310580. doi: 10.1371/journal.pone.0310580. eCollection 2025.

Continuity in morphological disparity in tropical reef fishes across evolutionary scales.热带珊瑚礁鱼类在进化尺度上形态差异的连续性。

Commun Biol. 2025 Feb 17;8(1):252. doi: 10.1038/s42003-025-07634-7.

Population Genomics of the Blue Shark, , Reveals Different Populations in the Mediterranean Sea and the Northeast Atlantic.蓝鲨的种群基因组学揭示了地中海和东北大西洋的不同种群。

Evol Appl. 2024 Sep 17;17(9):e70005. doi: 10.1111/eva.70005. eCollection 2024 Sep.

Association mapping and candidate gene identification for yield traits in European hazelnut ( L.).欧洲榛（Corylus avellana L.）产量性状的关联作图与候选基因鉴定

Plant Direct. 2024 Aug 20;8(8):e625. doi: 10.1002/pld3.625. eCollection 2024 Sep.

Next-generation data filtering in the genomics era.基因组学时代的下一代数据过滤。

Nat Rev Genet. 2024 Nov;25(11):750-767. doi: 10.1038/s41576-024-00738-6. Epub 2024 Jun 14.

Complex patterns of genetic population structure in the mouthbrooding marine catfish, , in the Gulf of Mexico and U.S. Atlantic.墨西哥湾和美国大西洋海域口育海鲶的复杂遗传种群结构模式。

Ecol Evol. 2024 Jun 9;14(6):e11514. doi: 10.1002/ece3.11514. eCollection 2024 Jun.

Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data.从低覆盖度或混合测序数据中估计微单倍型等位基因频率。

BMC Bioinformatics. 2023 Nov 3;24(1):415. doi: 10.1186/s12859-023-05554-z.

Population genetic structure and hybrid zone analyses for species delimitation in the Japanese toad ().种群遗传结构和杂种区分析在日本蟾蜍（Bufo japonicus）物种界定中的应用。

PeerJ. 2023 Oct 24;11:e16302. doi: 10.7717/peerj.16302. eCollection 2023.

Spatial and temporal patterns in the population genomics of the European cockchafer in the Alpine region.阿尔卑斯地区欧洲金龟子种群基因组学的时空模式

Evol Appl. 2023 Sep 1;16(9):1586-1597. doi: 10.1111/eva.13588. eCollection 2023 Sep.

Easy-to-use R functions to separate reduced-representation genomic datasets into sex-linked and autosomal loci, and conduct sex assignment.易于使用的R函数，用于将简化代表性基因组数据集分离为性连锁和常染色体位点，并进行性别分配。

Mol Ecol Resour. 2025 Jul;25(5):e13844. doi: 10.1111/1755-0998.13844. Epub 2023 Aug 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对 RAD 位点进行单体型分析：一种有效过滤旁系同源基因和考虑物理连锁的方法。

Haplotyping RAD loci: an efficient method to filter paralogs and account for physical linkage.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献