将疟原虫作为模拟补充的应用：一个评估个体混合估计方法的简单示例。

The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies.

作者信息

Vaughan Laura K, Divers Jasmin, Padilla Miguel, Redden David T, Tiwari Hemant K, Pomp Daniel, Allison David B

机构信息

Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294.

出版信息

Comput Stat Data Anal. 2009 Mar 15;53(5):1755-1766. doi: 10.1016/j.csda.2008.02.032.

DOI:10.1016/j.csda.2008.02.032

PMID:20161321

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2678733/

Abstract

With the advent of powerful computers, simulation studies are becoming an important tool in statistical methodology research. However, computer simulations of a specific process are only as good as our understanding of the underlying mechanisms. An attractive supplement to simulations is the use of plasmode datasets. Plasmodes are data sets that are generated by natural biologic processes, under experimental conditions that allow some aspect of the truth to be known. The benefit of the plasmode approach is that the data are generated through completely natural processes, thus circumventing the common concern of the realism and accuracy of computer simulated data. The estimation of admixture, or the proportion of an individual's genome that originates from different founding populations, is a particularly difficult research endeavor that is well suited to the use of plasmodes. Current methods have been tested with simulations of complex populations where the underlying mechanisms such as the rate and distribution of recombination are not well understood. To demonstrate the utility of this method data derived from mouse crosses is used to evaluate the effectiveness of several admixture estimation methodologies. Each cross shares a common founding population so that the ancestry proportion for each individual is known, allowing for the comparison of true and estimated individual admixture values. Analysis shows that the different estimation methodologies (Structure, AdmixMap and FRAPPE) examined all perform well with simple datasets. However, the performance of the estimation methodologies varied greatly when applied to a plasmode consisting of three founding populations. The results of these examples illustrate the utility of plasmodes in the evaluation of statistical genetics methodologies.

摘要

随着功能强大的计算机的出现，模拟研究正成为统计方法研究中的一种重要工具。然而，特定过程的计算机模拟效果仅取决于我们对潜在机制的理解程度。模拟的一个有吸引力的补充是使用模式数据集。模式数据集是在允许了解部分真相的实验条件下由自然生物过程生成的数据集。模式方法的好处在于数据是通过完全自然的过程生成的，从而避免了对计算机模拟数据的真实性和准确性的常见担忧。混合比例的估计，即个体基因组中源自不同创始群体的比例，是一项特别困难的研究工作，非常适合使用模式数据集。目前的方法已经在复杂群体的模拟中进行了测试，而这些群体的潜在机制，如重组率和分布，尚未得到很好的理解。为了证明这种方法的实用性，来自小鼠杂交的数据被用于评估几种混合比例估计方法的有效性。每个杂交都有一个共同的创始群体，因此每个个体的祖先比例是已知的，这使得可以比较真实和估计的个体混合值。分析表明，所研究的不同估计方法（Structure、AdmixMap和FRAPPE）在简单数据集上都表现良好。然而，当应用于由三个创始群体组成的模式数据集时，估计方法的性能差异很大。这些例子的结果说明了模式数据集在评估统计遗传学方法中的实用性。

相似文献

The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies.将疟原虫作为模拟补充的应用：一个评估个体混合估计方法的简单示例。

Comput Stat Data Anal. 2009 Mar 15;53(5):1755-1766. doi: 10.1016/j.csda.2008.02.032.

Statistical plasmode simulations-Potentials, challenges and recommendations.统计等离子体模拟——潜力、挑战和建议。

Stat Med. 2024 Apr 30;43(9):1804-1825. doi: 10.1002/sim.10012. Epub 2024 Feb 14.

Evaluating statistical analysis models for RNA sequencing experiments.评估 RNA 测序实验的统计分析模型。

Front Genet. 2013 Sep 17;4:178. doi: 10.3389/fgene.2013.00178. eCollection 2013.

A fast least-squares algorithm for population inference.一种快速的用于群体推断的最小二乘法。

BMC Bioinformatics. 2013 Jan 23;14:28. doi: 10.1186/1471-2105-14-28.

Inferring the ancestry of parents and grandparents from genetic data.从遗传数据推断父母和祖父母的祖先。

PLoS Comput Biol. 2020 Aug 14;16(8):e1008065. doi: 10.1371/journal.pcbi.1008065. eCollection 2020 Aug.

Simulation study to evaluate when Plasmode simulation is superior to parametric simulation in estimating the mean squared error of the least squares estimator in linear regression.模拟研究评估在线性回归中，当 Plasmode 模拟在估计最小二乘估计器的均方误差方面优于参数模拟时的情况。

PLoS One. 2024 May 15;19(5):e0299989. doi: 10.1371/journal.pone.0299989. eCollection 2024.

Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

AdmixSim 2: a forward-time simulator for modeling complex population admixture.AdmixSim 2：用于建模复杂人群混合的前向时间模拟器。

BMC Bioinformatics. 2021 Oct 18;22(1):506. doi: 10.1186/s12859-021-04415-x.

Admixture mapping of end stage kidney disease genetic susceptibility using estimated mutual information ancestry informative markers.利用估计的互信息遗传标记进行终末期肾病遗传易感性的混合映射。

BMC Med Genomics. 2010 Oct 18;3:47. doi: 10.1186/1755-8794-3-47.

An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data.基于全外显子组测序数据的西班牙裔个体祖籍信息标记面板设计用于个体祖籍估计。

BMC Genomics. 2019 Dec 30;20(Suppl 12):1007. doi: 10.1186/s12864-019-6333-6.

引用本文的文献

Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.在具有强内部相关性的数据集中，要警惕虚假发现的反直觉水平。

Genome Biol. 2025 Aug 18;26(1):249. doi: 10.1186/s13059-025-03734-z.

A Framework for Generating Realistic Synthetic Tabular Data in a Randomized Controlled Trial Setting.一种在随机对照试验环境中生成逼真合成表格数据的框架。

Stat Med. 2025 Aug;44(18-19):e70227. doi: 10.1002/sim.70227.

Cardinality matching versus propensity score matching for addressing cluster-level residual confounding in implantable medical device and surgical epidemiology: a parametric and plasmode simulation study.基于参数和等离子体模拟的研究：针对医疗器械和外科流行病学中基于簇的残余混杂问题，采用配比法和倾向评分匹配法的比较。

BMC Med Res Methodol. 2024 Nov 22;24(1):289. doi: 10.1186/s12874-024-02406-z.

Unraveling the genomic diversity and admixture history of captive tigers in the United States.解析美国圈养老虎的基因组多样性和混合历史。

Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2402924121. doi: 10.1073/pnas.2402924121. Epub 2024 Sep 19.

Comparison of two propensity score-based methods for balancing covariates: the overlap weighting and fine stratification methods in real-world claims data.两种基于倾向评分匹配方法的比较：真实世界理赔数据中的重叠加权法和精细分层法。

BMC Med Res Methodol. 2024 Jun 3;24(1):122. doi: 10.1186/s12874-024-02228-z.

Information sharing in high-dimensional gene expression data for improved parameter estimation in concentration-response modelling.高维基因表达数据中的信息共享，以改进浓度反应建模中的参数估计。

PLoS One. 2023 Oct 20;18(10):e0293180. doi: 10.1371/journal.pone.0293180. eCollection 2023.

Longitudinal plasmode algorithms to evaluate statistical methods in realistic scenarios: an illustration applied to occupational epidemiology.纵向血浆算法在现实场景中评估统计方法：应用于职业流行病学的实例说明。

BMC Med Res Methodol. 2023 Oct 18;23(1):242. doi: 10.1186/s12874-023-02062-9.

Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach.当模型不可行时，在纵向研究中对缺失数据实施多重填补：使用随机热卡方法的一个示例。

Clin Epidemiol. 2022 Nov 15;14:1387-1403. doi: 10.2147/CLEP.S368303. eCollection 2022.

A Framework for Using Real-World Data and Health Outcomes Modeling to Evaluate Machine Learning-Based Risk Prediction Models.一个利用真实世界数据和健康结果建模来评估基于机器学习的风险预测模型的框架。

Value Health. 2022 Mar;25(3):350-358. doi: 10.1016/j.jval.2021.11.1360. Epub 2021 Dec 22.

Causal simulation experiments: Lessons from bias amplification.因果模拟实验：从偏差放大中吸取的教训。

Stat Methods Med Res. 2022 Jan;31(1):3-46. doi: 10.1177/0962280221995963. Epub 2021 Nov 23.

本文引用的文献

Statistical software for gene mapping by admixture linkage disequilibrium.用于通过混合连锁不平衡进行基因定位的统计软件。

Brief Bioinform. 2007 Nov;8(6):393-5. doi: 10.1093/bib/bbm035. Epub 2007 Jul 18.

Correcting for measurement error in individual ancestry estimates in structured association tests.在结构化关联测试中校正个体祖先估计中的测量误差。

Genetics. 2007 Jul;176(3):1823-33. doi: 10.1534/genetics.107.075408. Epub 2007 May 16.

A population genetics model with recombination hotspots that are heterogeneous across the population.一个具有在群体中异质的重组热点的群体遗传学模型。

Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4748-52. doi: 10.1073/pnas.0610195104. Epub 2007 Mar 5.

Admixture mapping of an allele affecting interleukin 6 soluble receptor and interleukin 6 levels.影响白细胞介素6可溶性受体和白细胞介素6水平的一个等位基因的混合映射。

Am J Hum Genet. 2007 Apr;80(4):716-26. doi: 10.1086/513206. Epub 2007 Mar 8.

An evolutionary view of human recombination.人类重组的进化观点。

Nat Rev Genet. 2007 Jan;8(1):23-34. doi: 10.1038/nrg1947. Epub 2006 Dec 5.

Insights into recombination from population genetic variation.从群体遗传变异中洞察重组

Curr Opin Genet Dev. 2006 Dec;16(6):565-72. doi: 10.1016/j.gde.2006.10.001. Epub 2006 Oct 16.

Epistemological issues in omics and high-dimensional biology: give the people what they want.组学与高维生物学中的认识论问题：满足人们的需求。

Physiol Genomics. 2006 Dec 13;28(1):24-32. doi: 10.1152/physiolgenomics.00095.2006. Epub 2006 Sep 12.

Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men.混合线性模型定位法将8号染色体长臂24区鉴定为非裔美国男性前列腺癌的风险基因座。

Proc Natl Acad Sci U S A. 2006 Sep 19;103(38):14068-73. doi: 10.1073/pnas.0605832103. Epub 2006 Aug 31.

Regional admixture mapping and structured association testing: conceptual unification and an extensible general linear model.区域混合映射与结构化关联测试：概念统一与可扩展的通用线性模型。

PLoS Genet. 2006 Aug 25;2(8):e137. doi: 10.1371/journal.pgen.0020137. Epub 2006 Jul 18.

The contribution of epistatic pleiotropy to the genetic architecture of covariation among polygenic traits in mice.上位性多效性对小鼠多基因性状间协变遗传结构的贡献。

Evol Dev. 2006 Sep-Oct;8(5):468-76. doi: 10.1111/j.1525-142X.2006.00120.x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验