使用决策树从下一代测序数据中快速调用变异体的研究。

A study on fast calling variants from next-generation sequencing data using decision tree.

机构信息

Shanghai Key Lab of Intelligent Information Processing, Shanghai, China.

School of Computer Science and Technology, Fudan University, Shanghai, China.

出版信息

BMC Bioinformatics. 2018 Apr 19;19(1):145. doi: 10.1186/s12859-018-2147-9.

DOI:10.1186/s12859-018-2147-9

PMID:29673316

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5907718/

Abstract

BACKGROUND

The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging.

RESULTS

We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments.

CONCLUSIONS

We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.

摘要

背景

下一代测序（NGS）技术的快速发展不断刷新着测序数据的通量。然而，由于缺乏既快速又准确的智能工具，NGS 数据的分析任务，特别是低覆盖度数据的分析任务，仍然具有挑战性。

结果

我们提出了一种基于决策树的变异调用算法。在一组真实数据上的实验表明，我们的算法在 SNV 和 indel 上具有很高的准确性和灵敏度，并对低覆盖度数据具有良好的适应性。特别是，我们的算法在实验中明显快于 3 种常用的工具。

结论

我们在 Fuwa 软件中实现了我们的算法，并将其与 4 种著名的变异调用器（即 Platypus、GATK-UnifiedGenotyper、GATK-HaplotypeCaller 和 SAMtools）一起应用于经过全基因组、全外显子组和低覆盖度全基因组测序技术分别产生的一个经过充分研究的样本 NA12878 的三个测序数据集。我们还在 4 个新发布的样本的 WGS 数据上进行了额外的实验，这些样本没有用于填充 dbSNP。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用决策树从下一代测序数据中快速调用变异体的研究。

A study on fast calling variants from next-generation sequencing data using decision tree.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

使用决策树从下一代测序数据中快速调用变异体的研究。

A study on fast calling variants from next-generation sequencing data using decision tree.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献