Suppr超能文献

对 RAD 位点进行单体型分析:一种有效过滤旁系同源基因和考虑物理连锁的方法。

Haplotyping RAD loci: an efficient method to filter paralogs and account for physical linkage.

机构信息

Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, 6300 Ocean Drive, Corpus Christi, TX, 78412, USA.

Marine Science Center, Northeastern University, 430 Nahant RD, Nahant, MA, 01908, USA.

出版信息

Mol Ecol Resour. 2017 Sep;17(5):955-965. doi: 10.1111/1755-0998.12647. Epub 2017 Feb 9.

Abstract

Next-generation sequencing of reduced-representation genomic libraries provides a powerful methodology for genotyping thousands of single-nucleotide polymorphisms (SNPs) among individuals of nonmodel species. Utilizing genotype data in the absence of a reference genome, however, presents a number of challenges. One major challenge is the trade-off between splitting alleles at a single locus into separate clusters (loci), creating inflated homozygosity, and lumping multiple loci into a single contig (locus), creating artefacts and inflated heterozygosity. This issue has been addressed primarily through the use of similarity cut-offs in sequence clustering. Here, two commonly employed, postclustering filtering methods (read depth and excess heterozygosity) used to identify incorrectly assembled loci are compared with haplotyping, another postclustering filtering approach. Simulated and empirical data sets were used to demonstrate that each of the three methods separately identified incorrectly assembled loci; more optimal results were achieved when the three methods were applied in combination. The results confirmed that including incorrectly assembled loci in population-genetic data sets inflates estimates of heterozygosity and deflates estimates of population divergence. Additionally, at low levels of population divergence, physical linkage between SNPs within a locus created artificial clustering in analyses that assume markers are independent. Haplotyping SNPs within a locus effectively neutralized the physical linkage issue without having to thin data to a single SNP per locus. We introduce a Perl script that haplotypes polymorphisms, using data from single or paired-end reads, and identifies potentially problematic loci.

摘要

下一代简化基因组文库测序为非模式物种个体中数千个单核苷酸多态性(SNP)的基因分型提供了一种强大的方法。然而,在缺乏参考基因组的情况下利用基因型数据存在许多挑战。一个主要的挑战是在单个基因座处将等位基因分裂成单独的聚类(基因座),从而产生膨胀的纯合性,或者将多个基因座合并到单个连续体(基因座)中,从而产生假象和膨胀的杂合性。这个问题主要通过在序列聚类中使用相似性截止值来解决。在这里,比较了两种常用的聚类后过滤方法(读深度和过剩杂合性),用于识别错误组装的基因座,另一种聚类后过滤方法是单倍型分析。使用模拟和经验数据集来证明这三种方法单独地都可以识别错误组装的基因座;当三种方法联合使用时,会得到更优的结果。结果证实,将错误组装的基因座纳入种群遗传数据集会增加杂合度的估计值并降低种群分歧的估计值。此外,在种群分歧程度较低的情况下,基因座内 SNP 之间的物理连锁在假定标记是独立的分析中会产生人为聚类。对基因座内的 SNP 进行单倍型分析可以有效地解决物理连锁问题,而无需将数据缩减到每个基因座一个 SNP。我们引入了一个 Perl 脚本,可以使用单端或双端读取的数据进行多态性单倍型分析,并识别潜在的有问题的基因座。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验