• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从多个二倍体样本的低覆盖测序数据中进行 SNP 检测和基因分型。

SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.

机构信息

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.

出版信息

Genome Res. 2011 Jun;21(6):952-60. doi: 10.1101/gr.113084.110. Epub 2010 Oct 27.

DOI:10.1101/gr.113084.110
PMID:20980557
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3106328/
Abstract

Reductions in the cost of sequencing have enabled whole-genome sequencing to identify sequence variants segregating in a population. An efficient approach is to sequence many samples at low coverage, then to combine data across samples to detect shared variants. Here, we present methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. For each population, we first collect SNP candidates based on independent sequence calls per site. We then use MARGARITA with genotype or phased haplotype data from the same samples to collect 20 ancestral recombination graphs (ARGs). We refine the posterior probability of SNP candidates by considering possible mutations at internal branches of the 40 marginal ancestral trees inferred from the 20 ARGs at the left and right flanking genotype sites. Using a population genetic prior distribution on tree-branch length and Bayesian inference, we determine a posterior probability of the SNP being real and also the most probable phased genotype call for each individual. We present experiments on both simulation data and real data from the 1000 Genomes Project to prove the applicability of the methods. We also explore the relative tradeoff between sequencing depth and the number of sequenced samples.

摘要

测序成本的降低使全基因组测序能够识别在人群中分离的序列变异。一种有效的方法是对许多样本进行低覆盖率测序,然后合并样本数据以检测共享的变异。在这里,我们提出了从低覆盖率测序数据中发现和分型单核苷酸多态性(SNP)位点的方法,利用共享单倍型(连锁不平衡)信息。对于每个群体,我们首先根据每个位点的独立序列调用收集 SNP 候选者。然后,我们使用 MARGARITA 与来自相同样本的基因型或分相单倍型数据一起,收集 20 个祖先重组图(ARG)。我们通过考虑在左右侧翼基因型位点的 20 个 ARG 推断出的 40 个边缘祖先树的内部分支处可能发生的突变,来细化 SNP 候选者的后验概率。通过对树分支长度的种群遗传先验分布和贝叶斯推断,我们确定 SNP 为真实的后验概率,以及每个个体的最可能分相基因型调用。我们在模拟数据和 1000 基因组计划的真实数据上进行实验,以证明该方法的适用性。我们还探讨了测序深度和测序样本数量之间的相对权衡。

相似文献

1
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.从多个二倍体样本的低覆盖测序数据中进行 SNP 检测和基因分型。
Genome Res. 2011 Jun;21(6):952-60. doi: 10.1101/gr.113084.110. Epub 2010 Oct 27.
2
Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads.基于连锁不平衡的低覆盖度鸟枪法测序数据的基因型调用。
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S53. doi: 10.1186/1471-2105-12-S1-S53.
3
A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing.一种用于下一代测序中相位和特征分析单倍型的动态贝叶斯马尔可夫模型。
Bioinformatics. 2013 Apr 1;29(7):878-85. doi: 10.1093/bioinformatics/btt065. Epub 2013 Feb 13.
4
Linked region detection using high-density SNP genotype data via the minimum recombinant model of pedigree haplotype inference.通过家系单倍型推断的最小重组模型,利用高密度SNP基因型数据进行连锁区域检测。
BMC Bioinformatics. 2009 Jul 15;10:216. doi: 10.1186/1471-2105-10-216.
5
Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.全基因组和重测序关联研究中所有单核苷酸多态性的同步分析。
PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.
6
Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays.基于动态模型的寡核苷酸微阵列上100K以上单核苷酸多态性(SNP)筛选和基因分型算法
Bioinformatics. 2005 May 1;21(9):1958-63. doi: 10.1093/bioinformatics/bti275. Epub 2005 Jan 18.
7
Joint genotype calling with array and sequence data.联合使用阵列和序列数据进行基因型调用。
Genet Epidemiol. 2012 Sep;36(6):527-37. doi: 10.1002/gepi.21657. Epub 2012 Jul 20.
8
Genotype calling and haplotyping in parent-offspring trios.对亲代-子代三体系进行基因型分型和单体型分型。
Genome Res. 2013 Jan;23(1):142-51. doi: 10.1101/gr.142455.112. Epub 2012 Oct 11.
9
High density linkage disequilibrium mapping using models of haplotype block variation.使用单倍型块变异模型进行高密度连锁不平衡作图。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i137-44. doi: 10.1093/bioinformatics/bth907.
10
Direct analysis of unphased SNP genotype data in population-based association studies via Bayesian partition modelling of haplotypes.在基于人群的关联研究中,通过单倍型的贝叶斯分区建模对未分型的单核苷酸多态性(SNP)基因型数据进行直接分析。
Genet Epidemiol. 2005 Sep;29(2):91-107. doi: 10.1002/gepi.20080.

引用本文的文献

1
From sub-Saharan Africa to China: Evolutionary history and adaptation of revealed by population genomics.从撒哈拉以南非洲到中国:人口基因组学揭示的进化历史和适应。
Sci Adv. 2024 Apr 19;10(16):eadh3425. doi: 10.1126/sciadv.adh3425. Epub 2024 Apr 17.
2
Rare and population-specific functional variation across pig lines.猪种间罕见且具有种群特异性的功能变异。
Genet Sel Evol. 2022 Jun 3;54(1):39. doi: 10.1186/s12711-022-00732-8.
3
Demographic Reconstruction of Antarctic Fur Seals Supports the Krill Surplus Hypothesis.南极毛皮海狮的种群重建支持磷虾过剩假说。
Genes (Basel). 2022 Mar 18;13(3):541. doi: 10.3390/genes13030541.
4
Allele-specific gene expression can underlie altered transcript abundance in zebrafish mutants.等位基因特异性基因表达可能是斑马鱼突变体中转录本丰度改变的基础。
Elife. 2022 Feb 17;11:e72825. doi: 10.7554/eLife.72825.
5
Mid-pass whole genome sequencing enables biomedical genetic studies of diverse populations.中程全基因组测序能够实现对不同人群的生物医学遗传学研究。
BMC Genomics. 2021 Nov 1;22(1):666. doi: 10.1186/s12864-021-07949-9.
6
Haplotype genomic prediction of phenotypic values based on chromosome distance and gene boundaries using low-coverage sequencing in Duroc pigs.基于低覆盖度测序的基于染色体距离和基因边界的表型值单倍型基因组预测在杜洛克猪中的应用。
Genet Sel Evol. 2021 Oct 7;53(1):78. doi: 10.1186/s12711-021-00661-y.
7
Efficient phasing and imputation of low-coverage sequencing data using large reference panels.利用大型参考面板实现低覆盖度测序数据的高效相位推断和插补。
Nat Genet. 2021 Jan;53(1):120-126. doi: 10.1038/s41588-020-00756-0. Epub 2021 Jan 7.
8
Evaluation of Whole-Exome Enrichment Solutions: Lessons from the High-End of the Short-Read Sequencing Scale.全外显子组富集解决方案评估:短读长测序规模高端领域的经验教训
J Clin Med. 2020 Nov 13;9(11):3656. doi: 10.3390/jcm9113656.
9
An improved pig reference genome sequence to enable pig genetics and genomics research.一个改良的猪参考基因组序列,以支持猪的遗传学和基因组学研究。
Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa051.
10
Evaluation of sequencing strategies for whole-genome imputation with hybrid peeling.杂交剥脱全基因组基因分型策略评估。
Genet Sel Evol. 2020 Apr 6;52(1):18. doi: 10.1186/s12711-020-00537-7.

本文引用的文献

1
MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH:利用序列和基因型数据来估计单倍型和未观测基因型。
Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.
2
A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.
3
Dindel: accurate indel calls from short-read data.Dindel:从短读数据中进行精确的插入缺失突变(Indel)调用。
Genome Res. 2011 Jun;21(6):961-73. doi: 10.1101/gr.112326.110. Epub 2010 Oct 27.
4
Integrating common and rare genetic variation in diverse human populations.整合不同人类群体中的常见和罕见遗传变异。
Nature. 2010 Sep 2;467(7311):52-8. doi: 10.1038/nature09298.
5
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.基因组分析工具包:一种用于分析下一代 DNA 测序数据的 MapReduce 框架。
Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.
6
Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.同时进行基因型调用和单倍型相位分析可提高全基因组关联研究的基因型准确性,并减少假阳性关联。
Am J Hum Genet. 2009 Dec;85(6):847-61. doi: 10.1016/j.ajhg.2009.11.004.
7
Exome sequencing identifies the cause of a mendelian disorder.外显子组测序确定了一种孟德尔疾病的病因。
Nat Genet. 2010 Jan;42(1):30-5. doi: 10.1038/ng.499. Epub 2009 Nov 13.
8
A highly annotated whole-genome sequence of a Korean individual.一名韩国个体的高度注释全基因组序列。
Nature. 2009 Aug 20;460(7258):1011-5. doi: 10.1038/nature08211. Epub 2009 Jul 8.
9
A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.一种用于下一代全基因组关联研究的灵活且准确的基因型填充方法。
PLoS Genet. 2009 Jun;5(6):e1000529. doi: 10.1371/journal.pgen.1000529. Epub 2009 Jun 19.
10
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.