不同映射算法对Pool-Seq数据全基因组多态性扫描的适用性

Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data.

作者信息

Kofler Robert, Langmüller Anna Maria, Nouhaud Pierre, Otte Kathrin Anna, Schlötterer Christian

机构信息

Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, 1210 Wien 1210, Austria.

Vienna Graduate School of Population Genetics, 1210, Austria.

出版信息

G3 (Bethesda). 2016 Nov 8;6(11):3507-3515. doi: 10.1534/g3.116.034488.

DOI:10.1534/g3.116.034488

PMID:27613752

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5100849/

Abstract

The cost-effectiveness of sequencing pools of individuals (Pool-Seq) provides the basis for the popularity and widespread use of this method for many research questions, ranging from unraveling the genetic basis of complex traits, to the clonal evolution of cancer cells. Because the accuracy of Pool-Seq could be affected by many potential sources of error, several studies have determined, for example, the influence of sequencing technology, the library preparation protocol, and mapping parameters. Nevertheless, the impact of the mapping tools has not yet been evaluated. Using simulated and real Pool-Seq data, we demonstrate a substantial impact of the mapping tools, leading to characteristic false positives in genome-wide scans. The problem of false positives was particularly pronounced when data with different read lengths and insert sizes were compared. Out of 14 evaluated algorithms novoalign, bwa mem and clc4 are most suitable for mapping Pool-Seq data. Nevertheless, no single algorithm is sufficient for avoiding all false positives. We show that the intersection of the results of two mapping algorithms provides a simple, yet effective, strategy to eliminate false positives. We propose that the implementation of a consistent Pool-Seq bioinformatics pipeline, building on the recommendations of this study, can substantially increase the reliability of Pool-Seq results, in particular when libraries generated with different protocols are being compared.

摘要

对个体样本池进行测序（Pool-Seq）的成本效益为该方法在许多研究问题中的广泛应用和普及提供了基础，这些研究问题涵盖从解析复杂性状的遗传基础到癌细胞的克隆进化等多个方面。由于Pool-Seq的准确性可能受到许多潜在误差来源的影响，例如，已有多项研究确定了测序技术、文库制备方案和比对参数的影响。然而，比对工具的影响尚未得到评估。通过使用模拟和真实的Pool-Seq数据，我们证明了比对工具具有重大影响，会在全基因组扫描中导致特征性的假阳性。当比较具有不同读长和插入片段大小的数据时，假阳性问题尤为突出。在评估的14种算法中，novoalign、bwa mem和clc4最适合用于比对Pool-Seq数据。然而，没有一种算法足以避免所有假阳性。我们表明，两种比对算法结果的交集提供了一种简单而有效的消除假阳性的策略。我们建议，基于本研究的建议实施一致的Pool-Seq生物信息学流程，可大幅提高Pool-Seq结果的可靠性，特别是在比较使用不同方案生成文库时。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

不同映射算法对Pool-Seq数据全基因组多态性扫描的适用性

Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

不同映射算法对Pool-Seq数据全基因组多态性扫描的适用性

Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献