• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用决策树从下一代测序数据中快速调用变异体的研究。

A study on fast calling variants from next-generation sequencing data using decision tree.

机构信息

Shanghai Key Lab of Intelligent Information Processing, Shanghai, China.

School of Computer Science and Technology, Fudan University, Shanghai, China.

出版信息

BMC Bioinformatics. 2018 Apr 19;19(1):145. doi: 10.1186/s12859-018-2147-9.

DOI:10.1186/s12859-018-2147-9
PMID:29673316
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5907718/
Abstract

BACKGROUND

The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging.

RESULTS

We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments.

CONCLUSIONS

We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.

摘要

背景

下一代测序(NGS)技术的快速发展不断刷新着测序数据的通量。然而,由于缺乏既快速又准确的智能工具,NGS 数据的分析任务,特别是低覆盖度数据的分析任务,仍然具有挑战性。

结果

我们提出了一种基于决策树的变异调用算法。在一组真实数据上的实验表明,我们的算法在 SNV 和 indel 上具有很高的准确性和灵敏度,并对低覆盖度数据具有良好的适应性。特别是,我们的算法在实验中明显快于 3 种常用的工具。

结论

我们在 Fuwa 软件中实现了我们的算法,并将其与 4 种著名的变异调用器(即 Platypus、GATK-UnifiedGenotyper、GATK-HaplotypeCaller 和 SAMtools)一起应用于经过全基因组、全外显子组和低覆盖度全基因组测序技术分别产生的一个经过充分研究的样本 NA12878 的三个测序数据集。我们还在 4 个新发布的样本的 WGS 数据上进行了额外的实验,这些样本没有用于填充 dbSNP。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d42c/5907718/9cdd15f2cc13/12859_2018_2147_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d42c/5907718/7a166ef8e6e0/12859_2018_2147_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d42c/5907718/9cdd15f2cc13/12859_2018_2147_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d42c/5907718/7a166ef8e6e0/12859_2018_2147_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d42c/5907718/9cdd15f2cc13/12859_2018_2147_Fig2_HTML.jpg

相似文献

1
A study on fast calling variants from next-generation sequencing data using decision tree.使用决策树从下一代测序数据中快速调用变异体的研究。
BMC Bioinformatics. 2018 Apr 19;19(1):145. doi: 10.1186/s12859-018-2147-9.
2
Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.全外显子组测序数据中插入/缺失(INDELs)的优化检测
PLoS One. 2017 Aug 9;12(8):e0182272. doi: 10.1371/journal.pone.0182272. eCollection 2017.
3
Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者:一项比较研究。
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
4
Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
5
An analytical workflow for accurate variant discovery in highly divergent regions.一种用于在高度分化区域进行准确变异发现的分析流程。
BMC Genomics. 2016 Sep 2;17(1):703. doi: 10.1186/s12864-016-3045-z.
6
Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data.使用人类全外显子组测序和模拟数据评估变异调用管道的性能。
BMC Bioinformatics. 2019 Jun 17;20(1):342. doi: 10.1186/s12859-019-2928-9.
7
Performance evaluation of indel calling tools using real short-read data.使用真实短读长数据对插入缺失(Indel)检测工具进行性能评估。
Hum Genomics. 2015 Aug 19;9(1):20. doi: 10.1186/s40246-015-0042-2.
8
Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers.跨多种下一代测序仪的种系变异调用管道的系统比较。
Sci Rep. 2019 Jun 27;9(1):9345. doi: 10.1038/s41598-019-45835-3.
9
Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data.工具评估用于检测下一代全基因组和靶向测序数据中的可变大小插入缺失。
PLoS Comput Biol. 2022 Feb 17;18(2):e1009269. doi: 10.1371/journal.pcbi.1009269. eCollection 2022 Feb.
10
A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference.使用“瓶中基因组”作为参考的变异检测流程比较
Biomed Res Int. 2015;2015:456479. doi: 10.1155/2015/456479. Epub 2015 Oct 11.

引用本文的文献

1
Analysis of gene polymorphisms in patients with pulmonary infections based on next-generation sequencing technology and their prognostic predictive value.基于下一代测序技术的肺部感染患者基因多态性分析及其预后预测价值
Front Med (Lausanne). 2025 Jul 7;12:1599791. doi: 10.3389/fmed.2025.1599791. eCollection 2025.
2
Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels.用于分析NGS种系基因检测板的映射、变异位点检测和区间填充流程的性能评估。
BMC Bioinformatics. 2021 Apr 28;22(1):218. doi: 10.1186/s12859-021-04144-1.
3
Aquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads.

本文引用的文献

1
Next-generation sequencing: big data meets high performance computing.下一代测序:大数据邂逅高性能计算。
Drug Discov Today. 2017 Apr;22(4):712-717. doi: 10.1016/j.drudis.2017.01.014. Epub 2017 Feb 2.
2
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.SNooPer:一种基于机器学习从低深度下一代测序中识别体细胞变异的方法。
BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.
3
Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.
Aquila能够基于连接片段实现参考辅助的二倍体个人基因组组装和全面的变异检测。
Nat Commun. 2021 Feb 17;12(1):1077. doi: 10.1038/s41467-021-21395-x.
4
BITS2019: the sixteenth annual meeting of the Italian society of bioinformatics.BITS2019:第十六届意大利生物信息学学会年会。
BMC Bioinformatics. 2020 Sep 16;21(Suppl 8):363. doi: 10.1186/s12859-020-03708-x.
5
Adapting Genotyping-by-Sequencing and Variant Calling for Heterogeneous Stock Rats.为异质种群大鼠调整测序基因分型和变异检测方法。
G3 (Bethesda). 2020 Jul 7;10(7):2195-2205. doi: 10.1534/g3.120.401325.
6
Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data.使用人类全外显子组测序和模拟数据评估变异调用管道的性能。
BMC Bioinformatics. 2019 Jun 17;20(1):342. doi: 10.1186/s12859-019-2928-9.
整合基于图谱、组装和单倍型的方法以在临床测序应用中进行变异检测。
Nat Genet. 2014 Aug;46(8):912-918. doi: 10.1038/ng.3036. Epub 2014 Jul 13.
4
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.整合人类序列数据集提供了一个基准 SNP 和 indel 基因型调用资源。
Nat Biotechnol. 2014 Mar;32(3):246-51. doi: 10.1038/nbt.2835. Epub 2014 Feb 16.
5
An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
6
An integrative variant analysis suite for whole exome next-generation sequencing data.用于全外显子组下一代测序数据的综合变异分析套件。
BMC Bioinformatics. 2012 Jan 12;13:8. doi: 10.1186/1471-2105-13-8.
7
The variant call format and VCFtools.变异调用格式和 VCFtools。
Bioinformatics. 2011 Aug 1;27(15):2156-8. doi: 10.1093/bioinformatics/btr330. Epub 2011 Jun 7.
8
A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。
Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.
9
Natural genetic variation caused by small insertions and deletions in the human genome.人类基因组中小的插入和缺失引起的自然遗传变异。
Genome Res. 2011 Jun;21(6):830-9. doi: 10.1101/gr.115907.110. Epub 2011 Apr 1.
10
A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.