Suppr超能文献

评估用于非配对下一代测序数据的变异调用工具。

Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data.

机构信息

Institute of Medical Informatics, University of Münster, Münster, 48149, Germany.

Laboratory Hematology, RadboudUMC, Nijmegen, 6525, Netherlands.

出版信息

Sci Rep. 2017 Feb 24;7:43169. doi: 10.1038/srep43169.

Abstract

Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.

摘要

有效的变异调用结果对于将下一代测序技术应用于临床常规至关重要。然而,有许多变异调用工具,它们通常在算法、过滤策略、建议等方面存在差异,因此输出结果也不同。我们评估了八种开源工具在非配对下一代测序数据中调用单核苷酸变异和短插入/缺失的能力,等位基因频率低至 1%:GATK HaplotypeCaller、Platypus、VarScan、LoFreq、FreeBayes、SNVer、SAMtools 和 VarDict。我们分析了来自骨髓增生异常综合征患者的两个真实数据集,涵盖了 54 个 Illumina HiSeq 样本和 111 个 Illumina NextSeq 样本。通过在同一平台、不同平台和专家评审上重新测序来验证突变。此外,我们还考虑了两个具有不同覆盖范围和误差分布的模拟数据集,每个数据集涵盖 50 个样本。在所有情况下,分析了一个包含 19 个基因(42,322 bp)的相同目标区域。总的来说,没有一个工具能够成功调用所有的突变。高灵敏度总是伴随着低精度。覆盖范围和背景噪声对变异调用的影响通常较低。综合考虑,VarDict 的性能最佳。然而,我们的结果表明,需要提高多线程环境下结果的重现性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f17/5324109/32221c45f672/srep43169-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验