Suppr超能文献

SeqMule:用于分析人类外显子组/基因组测序数据的自动化流程

SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

作者信息

Guo Yunfei, Ding Xiaolei, Shen Yufeng, Lyon Gholson J, Wang Kai

机构信息

Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA 90033, USA.

Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA.

出版信息

Sci Rep. 2015 Sep 18;5:14283. doi: 10.1038/srep14283.

Abstract

Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.

摘要

下一代测序(NGS)技术极大地帮助我们识别孟德尔疾病的致病变异。然而,用户经常面临软件兼容性、配置复杂以及无法使用高性能计算设备等问题。比对器和变异检测工具之间存在差异。我们开发了一个计算流程SeqMule,用于对人类基因组和外显子组的NGS数据进行自动变异检测。SeqMule整合了基于变异检测工具的无计算集群并行化能力,并促进变异检测的标准化/交叉分析,以生成高可信度的一致性集合。SeqMule整合了5种比对工具、5种变异检测算法,并且通过一行命令接受各种组合,因此允许高度灵活但完全自动化的变异检测。在一台现代机器(2个英特尔至强X5650 CPU,48GB内存)上,当需要快速周转时,SeqMule能在一天内从30X全基因组测序数据集中生成注释的VCF文件;当需要更准确的检测时,SeqMule生成的一致性检测集合在孟德尔错误率和一致性方面均优于单个检测工具。SeqMule支持使用Sun Grid Engine进行并行处理,为在亚马逊网络服务上的部署提供一站式解决方案,允许进行质量检查、孟德尔错误检查、一致性评估以及基于HTML的报告。SeqMule可在http://seqmule.openbioinformatics.org获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea01/4585643/9e28e8f28da3/srep14283-f1.jpg

相似文献

1
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.
Sci Rep. 2015 Sep 18;5:14283. doi: 10.1038/srep14283.
3
Challenges in exome analysis by LifeScope and its alternative computational pipelines.
BMC Res Notes. 2015 Sep 7;8:421. doi: 10.1186/s13104-015-1385-4.
4
Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses.
BMC Res Notes. 2015 Mar 8;8:72. doi: 10.1186/s13104-015-1027-x.
5
Impact of post-alignment processing in variant discovery from whole exome data.
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
6
Variant callers for next-generation sequencing data: a comparison study.
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
8
CoVaCS: a consensus variant calling system.
BMC Genomics. 2018 Feb 5;19(1):120. doi: 10.1186/s12864-018-4508-1.
9
An analytical workflow for accurate variant discovery in highly divergent regions.
BMC Genomics. 2016 Sep 2;17(1):703. doi: 10.1186/s12864-016-3045-z.

引用本文的文献

2
Whole-exome sequencing to identify causative variants in juvenile sudden cardiac death.
Hum Genomics. 2024 Sep 16;18(1):102. doi: 10.1186/s40246-024-00657-x.
5
Kuura-An automated workflow for analyzing WES and WGS data.
PLoS One. 2024 Jan 18;19(1):e0296785. doi: 10.1371/journal.pone.0296785. eCollection 2024.
10
Resources and tools for rare disease variant interpretation.
Front Mol Biosci. 2023 May 10;10:1169109. doi: 10.3389/fmolb.2023.1169109. eCollection 2023.

本文引用的文献

2
"Genotype-first" approaches on a curious case of idiopathic progressive cognitive decline.
BMC Med Genomics. 2014 Dec 3;7:66. doi: 10.1186/s12920-014-0066-9.
4
DeNovoGear: de novo indel and point mutation discovery and phasing.
Nat Methods. 2013 Oct;10(10):985-7. doi: 10.1038/nmeth.2611. Epub 2013 Aug 25.
5
Harnessing virtual machines to simplify next-generation DNA sequencing analysis.
Bioinformatics. 2013 Sep 1;29(17):2075-83. doi: 10.1093/bioinformatics/btt352. Epub 2013 Jun 20.
7
Rare variant detection using family-based sequencing analysis.
Proc Natl Acad Sci U S A. 2013 Mar 5;110(10):3985-90. doi: 10.1073/pnas.1222158110. Epub 2013 Feb 20.
8
A survey of tools for variant analysis of next-generation genome sequencing data.
Brief Bioinform. 2014 Mar;15(2):256-78. doi: 10.1093/bib/bbs086. Epub 2013 Jan 21.
9
An integrated map of genetic variation from 1,092 human genomes.
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
10
A likelihood-based framework for variant calling and de novo mutation detection in families.
PLoS Genet. 2012;8(10):e1002944. doi: 10.1371/journal.pgen.1002944. Epub 2012 Oct 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验