一种用于从测序数据中进行 SNP 调用、突变发现、关联映射和群体遗传参数估计的统计框架。

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

机构信息

Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA.

出版信息

Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.

DOI:10.1093/bioinformatics/btr509

PMID:21903627

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3198575/

Abstract

MOTIVATION

Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty.

RESULTS

We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors.

AVAILABILITY

http://samtools.sourceforge.net.

CONTACT

hengli@broadinstitute.org.

摘要

动机

大多数现有的 DNA 序列分析方法都依赖于准确的序列或基因型。然而，在下一代测序（NGS）的应用中，准确的基因型可能不容易获得（例如多样本低覆盖测序或体细胞突变发现）。这些应用迫切需要开发新的方法来分析具有不确定性的序列数据。

结果

我们提出了一种基于测序数据的统计框架，用于直接调用 SNP、发现体细胞突变、推断群体遗传参数和进行关联测试，而无需显式基因分型或基于连锁的插补。在真实数据上的实验表明，我们的方法在估计位点等位基因计数、推断等位基因频率谱和关联作图方面的准确性可与替代方法相媲美。我们还强调了使用对称数据集寻找体细胞突变的必要性，并证实对于发现稀有事件，错配通常是错误的主要来源。

可用性

http://samtools.sourceforge.net。

联系人

hengli@broadinstitute.org。

相似文献

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.

Estimation of allele frequency and association mapping using next-generation sequencing data.

BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231.

Genotype and SNP calling from next-generation sequencing data.

Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.

SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.

SNP calling by sequencing pooled samples.

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.

Genetics. 2018 Oct;210(2):719-731. doi: 10.1534/genetics.118.301336. Epub 2018 Aug 21.

Estimating individual admixture proportions from next generation sequencing data.

Genetics. 2013 Nov;195(3):693-702. doi: 10.1534/genetics.113.154138. Epub 2013 Sep 11.

Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation.

BMC Genet. 2017 Apr 5;18(1):32. doi: 10.1186/s12863-017-0501-y.

A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

PLoS One. 2016 Aug 18;11(8):e0160733. doi: 10.1371/journal.pone.0160733. eCollection 2016.

引用本文的文献

Genetic design of soybean hosts and bradyrhizobial endosymbionts reduces NO emissions from soybean rhizosphere.

Nat Commun. 2025 Sep 4;16(1):8023. doi: 10.1038/s41467-025-63223-6.

Role of microbial life history strategy in shaping the characteristics and evolution of host-microbiota interactions.

ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf168.

Prediction of protein structural changes mediated by NS-SNPs in antibiotic resistance determinants in Streptococcus pneumoniae.

Arch Microbiol. 2025 Aug 29;207(10):243. doi: 10.1007/s00203-025-04444-7.

Integration of genome and transcriptome reveals core genes and allele-specific expression genetic variants associated with immune differentiation of taurine ancestry dairy cattle from indicine ancestry cattle.

BMC Genomics. 2025 Aug 27;26(1):778. doi: 10.1186/s12864-025-11936-9.

Hairpin loop to hairpin loop: a full-length assembly of the ASFV genome using Oxford Nanopore long-read sequencing.

Front Microbiol. 2025 Aug 8;16:1615977. doi: 10.3389/fmicb.2025.1615977. eCollection 2025.

Revealing the potential transmission route of Cnaphalocrocis medinalis granulovirus capable of persistently causing granulosis epidemics.

Virus Evol. 2025 Jul 25;11(1):veaf055. doi: 10.1093/ve/veaf055. eCollection 2025.

Molecular analysis of androgen receptor splice variant AR-V3 reveals eminent ambiguity regarding activity and clinical utility.

Cancer Cell Int. 2025 Aug 26;25(1):316. doi: 10.1186/s12935-025-03948-y.

Benchmarking of low coverage sequencing workflows for precision genotyping in eggplant.

BMC Plant Biol. 2025 Aug 25;25(1):1125. doi: 10.1186/s12870-025-07242-x.

Learning a refinement model for variant analysis in non-human primate genomes.

BMC Genomics. 2025 Aug 25;26(1):775. doi: 10.1186/s12864-025-11921-2.

Integrating whole-genome resequencing and machine learning to refine QTL analysis for fruit quality traits in peach.

Hortic Res. 2025 May 23;12(7):uhaf087. doi: 10.1093/hr/uhaf087. eCollection 2025 Jul.

本文引用的文献

Accurate and comprehensive sequencing of personal genomes.

Genome Res. 2011 Sep;21(9):1498-505. doi: 10.1101/gr.123638.111. Epub 2011 Jul 19.

Variation in genome-wide mutation rates within and between human families.

Nat Genet. 2011 Jun 12;43(7):712-4. doi: 10.1038/ng.862.

Estimation of allele frequency and association mapping using next-generation sequencing data.

BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231.

The variant call format and VCFtools.

Bioinformatics. 2011 Aug 1;27(15):2156-8. doi: 10.1093/bioinformatics/btr330. Epub 2011 Jun 7.

Genotype and SNP calling from next-generation sequencing data.

Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.

Sequence-specific error profile of Illumina sequencers.

Nucleic Acids Res. 2011 Jul;39(13):e90. doi: 10.1093/nar/gkr344. Epub 2011 May 16.

A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

Low-coverage sequencing: implications for design of complex trait association studies.

Genome Res. 2011 Jun;21(6):940-51. doi: 10.1101/gr.117259.110. Epub 2011 Apr 1.

Improving SNP discovery by base alignment quality.

Bioinformatics. 2011 Apr 15;27(8):1157-8. doi: 10.1093/bioinformatics/btr076. Epub 2011 Feb 13.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于从测序数据中进行 SNP 调用、突变发现、关联映射和群体遗传参数估计的统计框架。

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

机构信息

Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA.

出版信息

Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.

DOI:10.1093/bioinformatics/btr509

PMID:21903627

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3198575/

Abstract

MOTIVATION

RESULTS

AVAILABILITY

http://samtools.sourceforge.net.

CONTACT

hengli@broadinstitute.org.

摘要

动机

结果

可用性

http://samtools.sourceforge.net。

联系人

hengli@broadinstitute.org。

一种用于从测序数据中进行 SNP 调用、突变发现、关联映射和群体遗传参数估计的统计框架。

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系人

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种用于从测序数据中进行 SNP 调用、突变发现、关联映射和群体遗传参数估计的统计框架。

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系人