• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组模拟方法合成人类基因组学的计算机数据集。

Genome simulation approaches for synthesizing in silico datasets for human genomics.

机构信息

Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, USA.

出版信息

Adv Genet. 2010;72:1-24. doi: 10.1016/B978-0-12-380862-2.00001-1.

DOI:10.1016/B978-0-12-380862-2.00001-1
PMID:21029846
Abstract

Simulated data is a necessary first step in the evaluation of new analytic methods because in simulated data the true effects are known. To successfully develop novel statistical and computational methods for genetic analysis, it is vital to simulate datasets consisting of single nucleotide polymorphisms (SNPs) spread throughout the genome at a density similar to that observed by new high-throughput molecular genomics studies. In addition, the simulation of environmental data and effects will be essential to properly formulate risk models for complex disorders. Data simulations are often criticized because they are much less noisy than natural biological data, as it is nearly impossible to simulate the multitude of possible sources of natural and experimental variability. However, simulating data in silico is the most straightforward way to test the true potential of new methods during development. Thus, advances that increase the complexity of data simulations will permit investigators to better assess new analytical methods. In this work, we will briefly describe some of the current approaches for the simulation of human genomics data describing the advantages and disadvantages of the various approaches. We will also include details on software packages available for data simulation. Finally, we will expand upon one particular approach for the creation of complex, human genomic datasets that uses a forward-time population simulation algorithm: genomeSIMLA. Many of the hallmark features of biological datasets can be synthesized in silico; still much research is needed to enhance our capabilities to create datasets that capture the natural complexity of biological datasets.

摘要

模拟数据是评估新分析方法的必要第一步,因为在模拟数据中,真实效应是已知的。为了成功开发用于遗传分析的新型统计和计算方法,至关重要的是要模拟包含单核苷酸多态性(SNP)的数据集,这些 SNP 分布在基因组中,密度与新的高通量分子基因组学研究中观察到的相似。此外,模拟环境数据和效应对于正确制定复杂疾病的风险模型也将至关重要。数据模拟经常受到批评,因为它们的噪声比自然生物数据小得多,因为几乎不可能模拟自然和实验变异性的众多可能来源。然而,在计算机中模拟数据是在开发过程中测试新方法真实潜力的最直接方法。因此,增加数据模拟复杂性的进展将使研究人员能够更好地评估新的分析方法。在这项工作中,我们将简要描述当前用于模拟人类基因组学数据的一些方法,介绍各种方法的优缺点。我们还将介绍用于数据模拟的软件包的详细信息。最后,我们将详细介绍一种用于创建使用正向时间群体模拟算法的复杂人类基因组数据集的特定方法:genomeSIMLA。许多生物数据集的标志性特征都可以在计算机中合成;仍需要进行大量研究,以增强我们创建能够捕获生物数据集自然复杂性的数据集的能力。

相似文献

1
Genome simulation approaches for synthesizing in silico datasets for human genomics.基因组模拟方法合成人类基因组学的计算机数据集。
Adv Genet. 2010;72:1-24. doi: 10.1016/B978-0-12-380862-2.00001-1.
2
Tag SNP selection in genotype data for maximizing SNP prediction accuracy.在基因型数据中选择标签单核苷酸多态性以最大化单核苷酸多态性预测准确性。
Bioinformatics. 2005 Jun;21 Suppl 1:i195-203. doi: 10.1093/bioinformatics/bti1021.
3
Detecting local high-scoring segments: a first-stage approach for genome-wide association studies.检测局部高分片段:全基因组关联研究的第一阶段方法。
Stat Appl Genet Mol Biol. 2006;5:Article22. doi: 10.2202/1544-6115.1192. Epub 2006 Sep 17.
4
Current bioinformatics tools in genomic biomedical research (Review).基因组生物医学研究中的当前生物信息学工具(综述)。
Int J Mol Med. 2006 Jun;17(6):967-73.
5
Alternative methods for H1 simulations in genome-wide association studies.全基因组关联研究中H1模拟的替代方法。
Hum Hered. 2012;73(2):95-104. doi: 10.1159/000336194. Epub 2012 Mar 28.
6
Applying in silico integrative genomics to genetic studies of human disease.将计算综合基因组学应用于人类疾病的遗传研究。
Int Rev Neurobiol. 2012;103:133-56. doi: 10.1016/B978-0-12-388408-4.00007-1.
7
SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies.SNPHarvester:一种在全基因组关联研究中基于过滤的上位性相互作用检测方法。
Bioinformatics. 2009 Feb 15;25(4):504-11. doi: 10.1093/bioinformatics/btn652. Epub 2008 Dec 19.
8
Assessing gene length biases in gene set analysis of Genome-Wide Association Studies.评估全基因组关联研究基因集分析中的基因长度偏差。
Int J Comput Biol Drug Des. 2010;3(4):297-310. doi: 10.1504/IJCBDD.2010.038394. Epub 2011 Feb 4.
9
Evaluation of potential power gain with imputed genotypes in genome-wide association studies.在全基因组关联研究中使用推算基因型评估潜在的功效增益。
Hum Hered. 2009;68(1):23-34. doi: 10.1159/000210446. Epub 2009 Apr 1.
10
invertFREGENE: software for simulating inversions in population genetic data.invertFREGENE:用于模拟群体遗传数据中倒位的软件。
Bioinformatics. 2010 Mar 15;26(6):838-40. doi: 10.1093/bioinformatics/btq029. Epub 2010 Jan 26.

引用本文的文献

1
GWASBrewer: An R Package for Simulating Realistic GWAS Summary Statistics.GWASBrewer:一个用于模拟逼真的全基因组关联研究汇总统计数据的R包。
Genet Epidemiol. 2025 Jan;49(1):e22594. doi: 10.1002/gepi.22594. Epub 2024 Oct 6.
2
SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination.SANTA-SIM:在选择和重组条件下模拟病毒序列进化动态
Virus Evol. 2019 Mar 8;5(1):vez003. doi: 10.1093/ve/vez003. eCollection 2019 Jan.
3
Genetic Simulation Resources and the GSR Certification Program.遗传模拟资源与GSR认证计划。
Bioinformatics. 2019 Feb 15;35(4):709-710. doi: 10.1093/bioinformatics/bty666.
4
A heuristic method for simulating open-data of arbitrary complexity that can be used to compare and evaluate machine learning methods.一种用于模拟任意复杂度开放数据的启发式方法,可用于比较和评估机器学习方法。
Pac Symp Biocomput. 2018;23:259-267.
5
Genetic data simulators and their applications: an overview.遗传数据模拟器及其应用:综述
Genet Epidemiol. 2015 Jan;39(1):2-10. doi: 10.1002/gepi.21876. Epub 2014 Dec 13.
6
Reproducible simulations of realistic samples for next-generation sequencing studies using Variant Simulation Tools.使用变异模拟工具对下一代测序研究的真实样本进行可重复模拟。
Genet Epidemiol. 2015 Jan;39(1):45-52. doi: 10.1002/gepi.21867. Epub 2014 Nov 13.
7
Heuristic identification of biological architectures for simulating complex hierarchical genetic interactions.用于模拟复杂层次遗传相互作用的生物架构的启发式识别。
Genet Epidemiol. 2015 Jan;39(1):25-34. doi: 10.1002/gepi.21865. Epub 2014 Nov 13.
8
Genetic simulation tools for post-genome wide association studies of complex diseases.用于复杂疾病基因组全关联研究后的遗传模拟工具。
Genet Epidemiol. 2015 Jan;39(1):11-19. doi: 10.1002/gepi.21870. Epub 2014 Nov 4.
9
Mendel: the Swiss army knife of genetic analysis programs.门德尔:遗传分析程序的瑞士军刀。
Bioinformatics. 2013 Jun 15;29(12):1568-70. doi: 10.1093/bioinformatics/btt187. Epub 2013 Apr 22.
10
Computer simulations: tools for population and evolutionary genetics.计算机模拟:群体和进化遗传学的工具。
Nat Rev Genet. 2012 Jan 10;13(2):110-22. doi: 10.1038/nrg3130.