Research Department of Non-Coronary Heart Diseases, Almazov National Medical Research Center, Ministry of Health of Russia, 2 Akkuratova St., St. Petersburg, 197341, Russia.
All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 3 Podbelsky Ch., St. Petersburg - Pushkin, 196608, Russia.
BMC Bioinformatics. 2019 Jan 22;20(1):45. doi: 10.1186/s12859-019-2616-9.
Samples pooling is a method widely used in studies to reduce costs and labour. DNA sample pooling combined with massive parallel sequencing is a powerful tool for discovering DNA variants (polymorphisms) in large analysing populations, which is the base of such research fields as Genome-Wide Association Studies, evolutionary and population studies, etc. Usage of overlapping pools where each sample is present in multiple pools can enhance the accuracy of polymorphism detection and allow identifying carriers of rare-variants. Surprisingly there is a lack of tools for result interpretation and carrier identification, i.e. for "depooling".
Here we present s-dePooler, the application for analysis of pooling experiments data. s-dePooler uses the variants information (VCF-file) and the pooling scheme to produce a list of candidate carriers for each polymorphism. We incorporated s-dePooler into a pipeline (dePoP) for automation of pooling analysis. The performance of the pipeline was tested on a synthetic dataset built using the 1000 Genomes Project data, resulting in the successful identification 97% of carriers of polymorphisms present in fewer than ~ 10% of carriers.
s-dePooler along with dePoP can be used to identify carriers of polymorphisms in overlapping pools, and is compatible with any pooling scheme with equivalent molar ratios of pooled samples. s-dePooler and dePoP with usage instructions and test data are freely available at https://github.com/lab9arriam/depop .
样本池化是一种广泛应用于研究以降低成本和劳动力的方法。将 DNA 样本池化与大规模平行测序相结合,是在大型分析人群中发现 DNA 变体(多态性)的强大工具,这是全基因组关联研究、进化和群体研究等领域的基础。使用重叠池,其中每个样本存在于多个池中,可以提高多态性检测的准确性,并允许识别罕见变体的携带者。令人惊讶的是,缺乏用于结果解释和携带者识别的工具,即“解池化”。
在这里,我们介绍了 s-dePooler,这是一种用于分析池化实验数据的应用程序。s-dePooler 使用变体信息(VCF 文件)和池化方案,为每个多态性生成候选携带者列表。我们将 s-dePooler 纳入了一个用于自动化池化分析的管道(dePoP)中。该管道的性能在使用 1000 基因组计划数据构建的合成数据集上进行了测试,成功识别了少于 ~10%携带者的多态性携带者的 97%。
s-dePooler 与 dePoP 一起可用于识别重叠池中的多态性携带者,并且与任何具有等效摩尔比的池化方案兼容。s-dePooler 和 dePoP 带有使用说明和测试数据可在 https://github.com/lab9arriam/depop 上免费获得。