Mathematical and Statistical Sciences, University of Colorado, Denver, Denver, CO 80204, USA; Mathematics and Physical Sciences, The College of Idaho, Caldwell, ID 83605, USA.
Department of Biostatistics, Boston University School of Public Health, Boston, MA 02215, USA.
Am J Hum Genet. 2022 Apr 7;109(4):680-691. doi: 10.1016/j.ajhg.2022.02.009. Epub 2022 Mar 16.
Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.
鉴定罕见变异关联对于充分描述复杂性状和疾病的遗传结构至关重要。在这个过程中,关键是要在模拟数据中评估新方法,这些方法要能反映真实数据中罕见变异和单倍型结构的分布。此外,导入真实变异注释可以实现方法的模拟比较,如针对潜在因果变异的罕见变异关联测试和多基因评分方法。现有的模拟方法要么无法使用真实变异注释,要么严重低估或高估单倍体和双倍体的数量,从而降低了将模拟结果推广到真实研究的能力。我们提出了 RAREsim,这是一种灵活且准确的罕见变异模拟算法。RAREsim 使用来自真实测序数据的参数和单倍型,有效地模拟了预期的变异分布,并支持真实变异注释。我们强调了 RAREsim 在各种遗传区域、样本大小、祖源和变异类别中的实用性。