Weissensteiner Hansi, Forer Lukas, Fendt Liane, Kheirkhah Azin, Salas Antonio, Kronenberg Florian, Schoenherr Sebastian
Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, 6020 Innsbruck, Austria.
Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Sanitarias (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15782, Galicia, Spain.
Genome Res. 2021 Feb;31(2):309-316. doi: 10.1101/gr.256545.119. Epub 2021 Jan 15.
Within-species contamination is a major issue in sequencing studies, especially for mitochondrial studies. Contamination can be detected by analyzing the nuclear genome or by inspecting polymorphic sites in the mitochondrial genome (mtDNA). Existing methods using the nuclear genome are computationally expensive, and no appropriate tool for detecting sample contamination in large-scale mtDNA data sets is available. Here we present haplocheck, a tool that requires only the mtDNA to detect contamination in both targeted mitochondrial and whole-genome sequencing studies. Our in silico simulations and amplicon mixture experiments indicate that haplocheck detects mtDNA contamination accurately and is independent of the phylogenetic distance within a sample mixture. By applying haplocheck to The 1000 Genomes Project Consortium data, we further evaluate the application of haplocheck as a fast proxy tool for nDNA-based contamination detection using the mtDNA and identify the mitochondrial copy number within a mixture as a critical component for the overall accuracy. The haplocheck tool is available both as a command-line tool and as a cloud web service producing interactive reports that facilitates the navigation through the phylogeny of contaminated samples.
种内污染是测序研究中的一个主要问题,尤其是在线粒体研究中。可以通过分析核基因组或检查线粒体基因组(mtDNA)中的多态性位点来检测污染。现有的使用核基因组的方法计算成本高昂,并且没有适用于检测大规模mtDNA数据集样本污染的工具。在此,我们展示了haplocheck,这是一种仅需mtDNA即可在靶向线粒体测序和全基因组测序研究中检测污染的工具。我们的计算机模拟和扩增子混合实验表明,haplocheck能够准确检测mtDNA污染,并且与样本混合物中的系统发育距离无关。通过将haplocheck应用于千人基因组计划联盟的数据,我们进一步评估了haplocheck作为使用mtDNA进行基于nDNA的污染检测的快速替代工具的应用,并确定混合物中的线粒体拷贝数是整体准确性的关键组成部分。haplocheck工具既可以作为命令行工具使用,也可以作为云网络服务使用,该服务会生成交互式报告,便于浏览受污染样本的系统发育情况。