Suppr超能文献

混合样本的下一代测序:变异筛选指南。

Next Generation Sequencing of Pooled Samples: Guideline for Variants' Filtering.

作者信息

Anand Santosh, Mangano Eleonora, Barizzone Nadia, Bordoni Roberta, Sorosina Melissa, Clarelli Ferdinando, Corrado Lucia, Martinelli Boneschi Filippo, D'Alfonso Sandra, De Bellis Gianluca

机构信息

Institute for Biomedical Technologies, National Research Council, Segrate (MI), Italy.

Department of Science and Technology, University of Sannio, Benevento, Italy.

出版信息

Sci Rep. 2016 Sep 27;6:33735. doi: 10.1038/srep33735.

Abstract

Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

摘要

尽管新一代测序(NGS)成本不断下降,但群体遗传学研究中经常需要对大量个体进行测序,这在经济上仍然具有挑战性。Pool-seq是一种具有成本效益且节省时间的替代方法,它将几个个体的DNA混合起来进行测序。然而,DNA混合为准确的变异检测和等位基因频率(AF)估计带来了新的问题和挑战。特别是,测序错误会与混合样本中低频出现的等位基因混淆,可能导致假阳性变异。在一项靶向重测序实验中,我们对83个样本池(每个样本池12个个体)中的996个个体进行了测序。通过将Pool-seq的AF与公共变异数据库以及样本池中个体受试者的内部SNP基因分型数据进行比较,我们表明Pool-seq的AF是稳健且可靠的。此外,我们基于柯尔莫哥洛夫-斯米尔诺夫统计检验提出了一个简单的筛选指南,用于去除虚假变异。通过将Pool-seq与个体测序数据进行比较,我们通过实验验证了我们的筛选方法,结果表明这些筛选方法能够去除大多数假变异,同时保留大多数真变异。所提出的指南本质上相当通用,可轻松应用于其他Pool-seq实验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea5c/5037392/28d6048d187e/srep33735-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验