全基因组测序的多样本变异检测方法比较

Comparison of Multi-Sample Variant Calling Methods for Whole Genome Sequencing.

作者信息

Nho Kwangsik, West John D, Li Huian, Henschel Robert, Bharthur Apoorva, Tavares Michel C, Saykin Andrew J

机构信息

Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA ; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA.

Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA.

出版信息

IEEE Int Conf Systems Biol. 2014 Oct;2014:59-62. doi: 10.1109/ISB.2014.6990432.

DOI:10.1109/ISB.2014.6990432

PMID:26167514

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4496949/

Abstract

Rapid advancement of next-generation sequencing (NGS) technologies has facilitated the search for genetic susceptibility factors that influence disease risk in the field of human genetics. In particular whole genome sequencing (WGS) has been used to obtain the most comprehensive genetic variation of an individual and perform detailed evaluation of all genetic variation. To this end, sophisticated methods to accurately call high-quality variants and genotypes simultaneously on a cohort of individuals from raw sequence data are required. On chromosome 22 of 818 WGS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), which is the largest WGS related to a single disease, we compared two multi-sample variant calling methods for the detection of single nucleotide variants (SNVs) and short insertions and deletions (indels) in WGS: (1) reduce the analysis-ready reads (BAM) file to a manageable size by keeping only essential information for variant calling ("") and (2) call variants individually on each sample and then perform a joint genotyping analysis of the variant files produced for all samples in a cohort (""). identified 515,210 SNVs and 60,042 indels, while identified 358,303 SNVs and 52,855 indels. identified many more SNVs and indels compared to . Both methods had concordance rate of 99.60% for SNVs and 99.06% for indels. For SNVs, evaluation with HumanOmni 2.5M genotyping arrays revealed a concordance rate of 99.68% for and 99.50% for . needed more computational time and memory compared to . Our findings indicate that the multi-sample variant calling method using the process is a promising strategy for the variant detection, which should facilitate our understanding of the underlying pathogenesis of human diseases.

摘要

下一代测序（NGS）技术的快速发展推动了人类遗传学领域中影响疾病风险的遗传易感性因素的研究。特别是全基因组测序（WGS）已被用于获取个体最全面的遗传变异，并对所有遗传变异进行详细评估。为此，需要复杂的方法来从原始序列数据中准确地同时在一组个体上调用高质量变异和基因型。在阿尔茨海默病神经成像计划（ADNI）的818个WGS数据的22号染色体上（这是与单一疾病相关的最大规模WGS），我们比较了两种用于检测WGS中的单核苷酸变异（SNV）和短插入缺失（indel）的多样本变异调用方法：（1）通过仅保留变异调用所需的基本信息，将分析就绪读段（BAM）文件减少到可管理的大小（“”），以及（2）在每个样本上单独调用变异，然后对一组中所有样本生成的变异文件进行联合基因分型分析（“”）。“”识别出515,210个SNV和60,042个indel，而“”识别出358,303个SNV和52,855个indel。与“”相比，“”识别出更多的SNV和indel。两种方法对于SNV的一致率为99.60%，对于indel的一致率为99.06%。对于SNV，使用HumanOmni 2.5M基因分型阵列评估显示，“”的一致率为99.68%，“”的一致率为99.50%。与“”相比，“”需要更多的计算时间和内存。我们的研究结果表明，使用“”流程的多样本变异调用方法是一种有前景的变异检测策略，这将有助于我们理解人类疾病的潜在发病机制。

相似文献

Comparison of Multi-Sample Variant Calling Methods for Whole Genome Sequencing.

IEEE Int Conf Systems Biol. 2014 Oct;2014:59-62. doi: 10.1109/ISB.2014.6990432.

A practical method to detect SNVs and indels from whole genome and exome sequencing data.

Sci Rep. 2013;3:2161. doi: 10.1038/srep02161.

Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.

Proc Natl Acad Sci U S A. 2015 Apr 28;112(17):5473-8. doi: 10.1073/pnas.1418631112. Epub 2015 Mar 31.

Evaluation of whole-genome sequencing of four Chinese crested dogs for variant detection using the ion proton system.

Canine Genet Epidemiol. 2015 Oct 8;2:16. doi: 10.1186/s40575-015-0029-2. eCollection 2015.

Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.

Genomics. 2019 Jul;111(4):808-818. doi: 10.1016/j.ygeno.2018.05.004. Epub 2018 May 29.

Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data.

PLoS Comput Biol. 2022 Feb 17;18(2):e1009269. doi: 10.1371/journal.pcbi.1009269. eCollection 2022 Feb.

SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.

From cytogenetics to cytogenomics: whole-genome sequencing as a first-line test comprehensively captures the diverse spectrum of disease-causing genetic variation underlying intellectual disability.

Genome Med. 2019 Nov 7;11(1):68. doi: 10.1186/s13073-019-0675-1.

Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data.

Sci Rep. 2019 Feb 11;9(1):1784. doi: 10.1038/s41598-018-38346-0.

A hybrid computational strategy to address WGS variant analysis in >5000 samples.

BMC Bioinformatics. 2016 Sep 10;17(1):361. doi: 10.1186/s12859-016-1211-6.

引用本文的文献

Genetic variants of phospholipase C-γ2 alter the phenotype and function of microglia and confer differential risk for Alzheimer's disease.

Immunity. 2023 Sep 12;56(9):2121-2136.e6. doi: 10.1016/j.immuni.2023.08.008. Epub 2023 Sep 1.

Rare CASP6N73T variant associated with hippocampal volume exhibits decreased proteolytic activity, synaptic transmission defect, and neurodegeneration.

Sci Rep. 2021 Jun 16;11(1):12695. doi: 10.1038/s41598-021-91367-0.

Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software.

Nat Commun. 2019 Jul 19;10(1):3240. doi: 10.1038/s41467-019-11146-4.

Rare variants in the splicing regulatory elements of EXOC3L4 are associated with brain glucose metabolism in Alzheimer's disease.

BMC Med Genomics. 2018 Sep 14;11(Suppl 3):76. doi: 10.1186/s12920-018-0390-6.

Identification of missing variants by combining multiple analytic pipelines.

BMC Bioinformatics. 2018 Apr 16;19(1):139. doi: 10.1186/s12859-018-2151-0.

XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets.

Nucleic Acids Res. 2018 Apr 6;46(6):e32. doi: 10.1093/nar/gkx1280.

Association analysis of rare variants near the APOE region with CSF and neuroimaging biomarkers of Alzheimer's disease.

BMC Med Genomics. 2017 May 24;10(Suppl 1):29. doi: 10.1186/s12920-017-0267-0.

Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease.

BMC Med Inform Decis Mak. 2017 May 18;17(Suppl 1):61. doi: 10.1186/s12911-017-0454-0.

Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials.

Alzheimers Dement. 2017 Apr;13(4):e1-e85. doi: 10.1016/j.jalz.2016.11.007. Epub 2017 Mar 22.

本文引用的文献

Clinical interpretation and implications of whole-genome sequencing.

JAMA. 2014 Mar 12;311(10):1035-45. doi: 10.1001/jama.2014.1717.

Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer's disease.

Nature. 2014 Jan 23;505(7484):550-554. doi: 10.1038/nature12825. Epub 2013 Dec 11.

Variant callers for next-generation sequencing data: a comparison study.

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.

Sequencing studies in human genetics: design and interpretation.

Nat Rev Genet. 2013 Jul;14(7):460-70. doi: 10.1038/nrg3455. Epub 2013 Jun 11.

GWAS of 126,559 individuals identifies genetic variants associated with educational attainment.

Science. 2013 Jun 21;340(6139):1467-71. doi: 10.1126/science.1235488. Epub 2013 May 30.

Whole-exome sequencing and imaging genetics identify functional variants for rate of change in hippocampal volume in mild cognitive impairment.

Mol Psychiatry. 2013 Jul;18(7):781-7. doi: 10.1038/mp.2013.24. Epub 2013 Apr 23.

TREM2 variants in Alzheimer's disease.

N Engl J Med. 2013 Jan 10;368(2):117-27. doi: 10.1056/NEJMoa1211851. Epub 2012 Nov 14.

Identification of common variants associated with human hippocampal and intracranial volumes.

Nat Genet. 2012 Apr 15;44(5):552-61. doi: 10.1038/ng.2250.

A generalizable hypothesis for the genetic architecture of disease: pleomorphic risk loci.

Hum Mol Genet. 2011 Oct 15;20(R2):R158-62. doi: 10.1093/hmg/ddr358. Epub 2011 Aug 29.

A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全基因组测序的多样本变异检测方法比较

Comparison of Multi-Sample Variant Calling Methods for Whole Genome Sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献