• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

sim1000G:一个用于无关个体和基于家系设计的 R 语言中易于使用的遗传变异模拟器。

sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs.

机构信息

Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 60, Murray Street, Toronto, ON, M5T 3L9, Canada.

Department of Statistical Sciences, University of Toronto, Toronto, M5S 3G3, Canada.

出版信息

BMC Bioinformatics. 2019 Jan 15;20(1):26. doi: 10.1186/s12859-019-2611-1.

DOI:10.1186/s12859-019-2611-1
PMID:30646839
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6332552/
Abstract

BACKGROUND

Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming.

RESULTS

To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters.

CONCLUSION

Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants.

摘要

背景

在人类和动物遗传学领域,评估统计方法经常需要模拟遗传变异数据。虽然已经开发了许多高质量的遗传模拟器,但其中许多需要在群体遗传学或计算方面的高级知识才能有效地使用。此外,在基于家庭的研究中生成模拟数据需要复杂的方法和高级计算机编程。

结果

为了解决这些问题,我们提出了一个新的用户友好且集成的 R 包 sim1000G,用于模拟无关个体或家庭中基因组区域的变异。唯一需要的输入是原始相位变异调用格式 (VCF) 文件。提取单倍型以计算模拟基因组区域中的连锁不平衡 (LD) 并生成无关个体之间的新基因型数据。跨变体的协方差用于保留原始群体的 LD 结构。通过使用 sim1000G 模拟重组事件来生成任意大小的系谱。为了说明 sim1000G 的应用,我们提出了各种场景,假设来自单个群体或两个不同群体的无关个体,或者替代为三代系谱数据。sim1000G 可以捕获等位基因频率多样性、短和长程连锁不平衡 (LD) 模式以及 LD 结构中的细微群体差异,而无需任何调整参数。

结论

sim1000G 通过其简单性和对外部工具的独立性,填补了遗传变异模拟器广泛领域中的空白。目前,它是少数几个完全集成到 R 中的模拟包之一,能够模拟无关个体和家庭内的多个遗传变异。它的实现将促进关联研究中稀有和常见变异的计算方法的应用和发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/f5cf0d5e9c4e/12859_2019_2611_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/ced597391c37/12859_2019_2611_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/d53d24c92095/12859_2019_2611_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/80db3f88bdbc/12859_2019_2611_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/f5cf0d5e9c4e/12859_2019_2611_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/ced597391c37/12859_2019_2611_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/d53d24c92095/12859_2019_2611_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/80db3f88bdbc/12859_2019_2611_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e3/6332552/f5cf0d5e9c4e/12859_2019_2611_Fig4_HTML.jpg

相似文献

1
sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs.sim1000G:一个用于无关个体和基于家系设计的 R 语言中易于使用的遗传变异模拟器。
BMC Bioinformatics. 2019 Jan 15;20(1):26. doi: 10.1186/s12859-019-2611-1.
2
Pedigree generation for analysis of genetic linkage and association.用于遗传连锁和关联分析的系谱生成。
Pac Symp Biocomput. 2004:93-103. doi: 10.1142/9789812704856_0010.
3
Phasing quality assessment in a brown layer population through family- and population-based software.通过基于家系和群体的软件对棕色层群体进行分相质量评估。
BMC Genet. 2019 Jul 17;20(1):57. doi: 10.1186/s12863-019-0759-3.
4
Joint Linkage and Association Analysis Using GENEHUNTER-MODSCORE with an Application to Familial Pancreatic Cancer.使用 GENEHUNTER-MODSCORE 进行联合连锁和关联分析及其在家族性胰腺癌中的应用。
Hum Hered. 2024;89(1):8-31. doi: 10.1159/000535840. Epub 2024 Jan 10.
5
Browsing isolated population data.浏览孤立种群数据。
BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-6-S4-S17.
6
A general model for likelihood computations of genetic marker data accounting for linkage, linkage disequilibrium, and mutations.一种用于计算遗传标记数据似然性的通用模型,该模型考虑了连锁、连锁不平衡和突变。
Int J Legal Med. 2015 Sep;129(5):943-54. doi: 10.1007/s00414-014-1117-7. Epub 2014 Nov 26.
7
mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes.mixIndependR:一个用于在多基因座基因型数据库中测试基因座统计独立性的 R 包。
BMC Bioinformatics. 2021 Jan 6;22(1):12. doi: 10.1186/s12859-020-03945-0.
8
HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients.HapSim:一种用于生成具有预先指定的等位基因频率和连锁不平衡系数的单倍型数据的模拟工具。
Bioinformatics. 2005 Dec 1;21(23):4309-11. doi: 10.1093/bioinformatics/bti689. Epub 2005 Sep 27.
9
Contributions of linkage disequilibrium and co-segregation information to the accuracy of genomic prediction.连锁不平衡和共分离信息对基因组预测准确性的贡献。
Genet Sel Evol. 2016 Oct 11;48(1):77. doi: 10.1186/s12711-016-0255-4.
10
Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.取决于亲本数量、亲缘关系和祖先连锁不平衡的合成群体中基因组预测的准确性。
Genetics. 2017 Jan;205(1):441-454. doi: 10.1534/genetics.116.193243. Epub 2016 Nov 9.

引用本文的文献

1
Separating direct, indirect and parent-of-origin genetic effects in the human population.区分人类群体中的直接、间接和源自亲代的遗传效应。
bioRxiv. 2025 Aug 27:2025.04.28.650988. doi: 10.1101/2025.04.28.650988.
2
A novel two-sample Mendelian randomization framework integrating common and rare variants: application to assess the effect of HDL-C on preeclampsia risk.一种整合常见和罕见变异的新型两样本孟德尔随机化框架:用于评估高密度脂蛋白胆固醇对先兆子痫风险影响的应用。
medRxiv. 2025 Aug 24:2025.08.20.25334100. doi: 10.1101/2025.08.20.25334100.
3
fSuSiE enables fine-mapping of QTLs from genome-scale molecular profiles.

本文引用的文献

1
A comparison of tools for the simulation of genomic next-generation sequencing data.用于模拟基因组下一代测序数据的工具比较。
Nat Rev Genet. 2016 Aug;17(8):459-69. doi: 10.1038/nrg.2016.57. Epub 2016 Jun 20.
2
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
3
SNP Set Association Testing for Survival Outcomes in the Presence of Intrafamilial Correlation.存在家族内相关性时生存结局的单核苷酸多态性集关联测试
fSuSiE能够对来自基因组规模分子图谱的数量性状基因座进行精细定位。
bioRxiv. 2025 Aug 17:2025.08.17.670732. doi: 10.1101/2025.08.17.670732.
4
py_ped_sim: a flexible forward pedigree and genetic simulator for complex family pedigree analysis.py_ped_sim:一款用于复杂家系分析的灵活的正向家系与遗传模拟器。
BMC Bioinformatics. 2025 May 7;26(1):122. doi: 10.1186/s12859-025-06142-z.
5
BEATRICE: Bayesian fine-mapping from summary data using deep variational inference.贝娅特丽斯:使用深度变分推断从汇总数据进行贝叶斯精细映射。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae590.
6
py_ped_sim - A flexible forward genetic simulator for complex family pedigree analysis.py_ped_sim - 用于复杂家系谱系分析的灵活正向遗传模拟器。
bioRxiv. 2024 Mar 29:2024.03.25.586501. doi: 10.1101/2024.03.25.586501.
7
HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes.HAPNEST:高效、大规模生成和评估基因型和表型的合成数据集。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad535.
8
CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses.CARMA 是一种用于全基因组关联荟萃分析精细映射的新贝叶斯模型。
Nat Genet. 2023 Jun;55(6):1057-1065. doi: 10.1038/s41588-023-01392-0. Epub 2023 May 11.
9
BEATRICE: Bayesian Fine-mapping from Summary Data using Deep Variational Inference.贝阿特丽斯:使用深度变分推理从汇总数据进行贝叶斯精细定位。
bioRxiv. 2024 Sep 8:2023.03.24.534116. doi: 10.1101/2023.03.24.534116.
10
COMMUTE: Communication-efficient transfer learning for multi-site risk prediction.通勤:面向多站点风险预测的通信高效迁移学习。
J Biomed Inform. 2023 Jan;137:104243. doi: 10.1016/j.jbi.2022.104243. Epub 2022 Nov 18.
Genet Epidemiol. 2015 Sep;39(6):406-14. doi: 10.1002/gepi.21914.
4
Genetic data simulators and their applications: an overview.遗传数据模拟器及其应用:综述
Genet Epidemiol. 2015 Jan;39(1):2-10. doi: 10.1002/gepi.21876. Epub 2014 Dec 13.
5
Genetic simulation tools for post-genome wide association studies of complex diseases.用于复杂疾病基因组全关联研究后的遗传模拟工具。
Genet Epidemiol. 2015 Jan;39(1):11-19. doi: 10.1002/gepi.21870. Epub 2014 Nov 4.
6
Rare-variant association analysis: study designs and statistical tests.罕见变异关联分析:研究设计与统计检验。
Am J Hum Genet. 2014 Jul 3;95(1):5-23. doi: 10.1016/j.ajhg.2014.06.009.
7
Genetic Simulation Resources: a website for the registration and discovery of genetic data simulators.遗传模拟资源:一个用于注册和发现遗传数据模拟器的网站。
Bioinformatics. 2013 Apr 15;29(8):1101-2. doi: 10.1093/bioinformatics/btt094. Epub 2013 Feb 23.
8
Simulating realistic genomic data with rare variants.模拟带有罕见变异的真实基因组数据。
Genet Epidemiol. 2013 Feb;37(2):163-72. doi: 10.1002/gepi.21696. Epub 2012 Nov 17.
9
Next generation analytic tools for large scale genetic epidemiology studies of complex diseases.下一代分析工具用于复杂疾病的大规模遗传流行病学研究。
Genet Epidemiol. 2012 Jan;36(1):22-35. doi: 10.1002/gepi.20652. Epub 2011 Dec 6.
10
Rare-variant association testing for sequencing data with the sequence kernel association test.基于序列核关联检验的测序数据罕见变异关联分析
Am J Hum Genet. 2011 Jul 15;89(1):82-93. doi: 10.1016/j.ajhg.2011.05.029. Epub 2011 Jul 7.