• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于基因变异发现研究的最优设计

On the optimal design of genetic variant discovery studies.

作者信息

Ionita-Laza Iuliana, Laird Nan M

机构信息

Columbia University, USA.

出版信息

Stat Appl Genet Mol Biol. 2010;9(1):Article33. doi: 10.2202/1544-6115.1581. Epub 2010 Aug 27.

DOI:10.2202/1544-6115.1581
PMID:20812911
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2942028/
Abstract

The recent emergence of massively parallel sequencing technologies has enabled an increasing number of human genome re-sequencing studies, notable among them being the 1000 Genomes Project. The main aim of these studies is to identify the yet unknown genetic variants in a genomic region, mostly low frequency variants (frequency less than 5%). We propose here a set of statistical tools that address how to optimally design such studies in order to increase the number of genetic variants we expect to discover. Within this framework, the tradeoff between lower coverage for more individuals and higher coverage for fewer individuals can be naturally solved. The methods here are also useful for estimating the number of genetic variants missed in a discovery study performed at low coverage. We show applications to simulated data based on coalescent models and to sequence data from the ENCODE project. In particular, we show the extent to which combining data from multiple populations in a discovery study may increase the number of genetic variants identified relative to studies on single populations.

摘要

近期大规模平行测序技术的出现使得越来越多的人类基因组重测序研究得以开展,其中引人注目的是千人基因组计划。这些研究的主要目的是识别基因组区域中尚未知晓的遗传变异,其中大多数是低频变异(频率小于5%)。我们在此提出一套统计工具,用于解决如何最优地设计此类研究,以增加预期发现的遗传变异数量。在此框架内,为更多个体提供较低覆盖度与为较少个体提供较高覆盖度之间的权衡能够自然得到解决。这里的方法对于估计在低覆盖度下进行的发现研究中遗漏的遗传变异数量也很有用。我们展示了这些方法在基于合并模型的模拟数据以及来自ENCODE计划的序列数据上的应用。特别是,我们展示了在发现研究中合并多个群体的数据相对于单群体研究而言,在多大程度上可能增加所识别的遗传变异数量。

相似文献

1
On the optimal design of genetic variant discovery studies.关于基因变异发现研究的最优设计
Stat Appl Genet Mol Biol. 2010;9(1):Article33. doi: 10.2202/1544-6115.1581. Epub 2010 Aug 27.
2
Estimating the number of unseen variants in the human genome.估算人类基因组中未发现变异的数量。
Proc Natl Acad Sci U S A. 2009 Mar 31;106(13):5008-13. doi: 10.1073/pnas.0807815106. Epub 2009 Mar 10.
3
Low frequency variants, collapsed based on biological knowledge, uncover complexity of population stratification in 1000 genomes project data.基于生物学知识的低频变异的崩溃揭示了 1000 基因组计划数据中人群分层的复杂性。
PLoS Genet. 2013;9(12):e1003959. doi: 10.1371/journal.pgen.1003959. Epub 2013 Dec 26.
4
Discovery of rare variants via sequencing: implications for the design of complex trait association studies.通过测序发现罕见变异:对复杂性状关联研究设计的启示
PLoS Genet. 2009 May;5(5):e1000481. doi: 10.1371/journal.pgen.1000481. Epub 2009 May 15.
5
Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development.用于数据库开发的多样本合并及Illumina基因组分析仪测序方法以确定基因序列变异
J Biomol Tech. 2010 Sep;21(3):126-40.
6
KNOWLEDGE DRIVEN BINNING AND PHEWAS ANALYSIS IN MARSHFIELD PERSONALIZED MEDICINE RESEARCH PROJECT USING BIOBIN.在马什菲尔德个性化医学研究项目中使用BioBin进行知识驱动的分箱和全表型组关联研究分析
Pac Symp Biocomput. 2016;21:249-60.
7
TIA: algorithms for development of identity-linked SNP islands for analysis by massively parallel DNA sequencing.TIA:用于通过大规模并行 DNA 测序分析的与身份相关联的 SNP 岛的算法。
BMC Bioinformatics. 2018 Apr 11;19(1):126. doi: 10.1186/s12859-018-2133-2.
8
Benchmarking variant identification tools for plant diversity discovery.植物多样性发现的变异识别工具基准测试。
BMC Genomics. 2019 Sep 9;20(1):701. doi: 10.1186/s12864-019-6057-7.
9
Deep sequencing of Danish Holstein dairy cattle for variant detection and insight into potential loss-of-function variants in protein coding genes.对丹麦荷斯坦奶牛进行深度测序,以检测变异并深入了解蛋白质编码基因中潜在的功能丧失变异。
BMC Genomics. 2015 Dec 9;16:1043. doi: 10.1186/s12864-015-2249-y.
10
A statistical method for the detection of variants from next-generation resequencing of DNA pools.一种用于从 DNA 池的下一代重测序中检测变异的统计方法。
Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.

引用本文的文献

1
Scaled Process Priors for Bayesian Nonparametric Estimation of the Unseen Genetic Variation.用于未观察到的基因变异的贝叶斯非参数估计的尺度化过程先验
J Am Stat Assoc. 2022 Sep 29;119(545):320-331. doi: 10.1080/01621459.2022.2115918. eCollection 2024.
2
Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects.量化人类群体中未被观察到的蛋白质编码变异为大规模测序项目提供了路线图。
Nat Commun. 2016 Oct 31;7:13293. doi: 10.1038/ncomms13293.
3
Predicting discovery rates of genomic features.预测基因组特征的发现率。
Genetics. 2014 Jun;197(2):601-10. doi: 10.1534/genetics.114.162149. Epub 2014 Mar 17.
4
Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data.基于大规模平行测序(MPS)数据的单核苷酸多态性(SNP)检测与基因型分型
Stat Biosci. 2013 May;5(1):3-25. doi: 10.1007/s12561-012-9067-4.
5
BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing.BETASEQ:一种强大的新方法,用于控制部分测序数据中罕见变异关联测试的 I 型错误膨胀。
Bioinformatics. 2014 Feb 15;30(4):480-7. doi: 10.1093/bioinformatics/btt719. Epub 2013 Dec 12.
6
EM vs MM: A Case Study.实体显微镜检查与体视显微镜检查:一个案例研究。
Comput Stat Data Anal. 2012 Dec;56(12):3909-3920. doi: 10.1016/j.csda.2012.05.018.
7
Two-stage design of sequencing studies for testing association with rare variants.用于检测与罕见变异关联的测序研究的两阶段设计。
Hum Hered. 2011;71(4):209-20. doi: 10.1159/000328193. Epub 2011 Jul 2.
8
Demographic history and rare allele sharing among human populations.人口历史与人类群体中的罕见等位基因共享。
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-8. doi: 10.1073/pnas.1019276108. Epub 2011 Jul 5.

本文引用的文献

1
The genetical structure of populations.种群的遗传结构。
Ann Eugen. 1951 Mar;15(4):323-54. doi: 10.1111/j.1469-1809.1949.tb02451.x.
2
The next generation of molecular markers from massively parallel sequencing of pooled DNA samples.基于 DNA 样本池的高通量测序的下一代分子标记物。
Genetics. 2010 Sep;186(1):207-18. doi: 10.1534/genetics.110.114397. Epub 2010 May 10.
3
Sequencing technologies - the next generation.测序技术——下一代。
Nat Rev Genet. 2010 Jan;11(1):31-46. doi: 10.1038/nrg2626. Epub 2009 Dec 8.
4
Finding the missing heritability of complex diseases.寻找复杂疾病中缺失的遗传力。
Nature. 2009 Oct 8;461(7265):747-53. doi: 10.1038/nature08494.
5
Massively parallel sequencing: the next big thing in genetic medicine.大规模平行测序:基因医学的下一个重大突破。
Am J Hum Genet. 2009 Aug;85(2):142-54. doi: 10.1016/j.ajhg.2009.06.022.
6
Estimating the number of unseen variants in the human genome.估算人类基因组中未发现变异的数量。
Proc Natl Acad Sci U S A. 2009 Mar 31;106(13):5008-13. doi: 10.1073/pnas.0807815106. Epub 2009 Mar 10.
7
A groupwise association test for rare mutations using a weighted sum statistic.使用加权和统计量对罕见突变进行分组关联测试。
PLoS Genet. 2009 Feb;5(2):e1000384. doi: 10.1371/journal.pgen.1000384. Epub 2009 Feb 13.
8
Next-generation DNA sequencing.下一代DNA测序
Nat Biotechnol. 2008 Oct;26(10):1135-45. doi: 10.1038/nbt1486.
9
Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.检测常见疾病与罕见变异关联的方法:在序列数据分析中的应用。
Am J Hum Genet. 2008 Sep;83(3):311-21. doi: 10.1016/j.ajhg.2008.06.024. Epub 2008 Aug 7.
10
GENOME: a rapid coalescent-based whole genome simulator.基因组:一种基于快速合并的全基因组模拟器。
Bioinformatics. 2007 Jun 15;23(12):1565-7. doi: 10.1093/bioinformatics/btm138. Epub 2007 Apr 25.