Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan.
Front Genet. 2013 Sep 23;4:185. doi: 10.3389/fgene.2013.00185. eCollection 2013.
The copy number variation (CNV) is a type of genetic variation in the genome. It is measured based on signal intensity measures and can be assessed repeatedly to reduce the uncertainty in PCR-based typing. Studies have shown that CNVs may lead to phenotypic variation and modification of disease expression. Various challenges exist, however, in the exploration of CNV-disease association. Here we construct latent variables to infer the discrete CNV values and to estimate the probability of mutations. In addition, we propose to pool rare variants to increase the statistical power and we conduct family studies to mitigate the computational burden in determining the composition of CNVs on each chromosome. To explore in a stochastic sense the association between the collapsing CNV variants and disease status, we utilize a Bayesian hierarchical model incorporating the mutation parameters. This model assigns integers in a probabilistic sense to the quantitatively measured copy numbers, and is able to test simultaneously the association for all variants of interest in a regression framework. This integrative model can account for the uncertainty in copy number assignment and differentiate if the variation was de novo or inherited on the basis of posterior probabilities. For family studies, this model can accommodate the dependence within family members and among repeated CNV data. Moreover, the Mendelian rule can be assumed under this model and yet the genetic variation, including de novo and inherited variation, can still be included and quantified directly for each individual. Finally, simulation studies show that this model has high true positive and low false positive rates in the detection of de novo mutation.
拷贝数变异(CNV)是基因组中的一种遗传变异类型。它基于信号强度测量来衡量,可以进行重复评估以降低基于 PCR 的分型的不确定性。研究表明,CNVs 可能导致表型变异和疾病表达的修饰。然而,在探索 CNV-疾病关联时存在各种挑战。在这里,我们构建潜在变量来推断离散的 CNV 值并估计突变的概率。此外,我们提出汇集罕见变体以增加统计能力,并进行家庭研究以减轻确定每条染色体上 CNV 组成的计算负担。为了从随机意义上探索崩溃的 CNV 变体与疾病状态之间的关联,我们利用包含突变参数的贝叶斯层次模型。该模型以概率的方式将整数分配给定量测量的拷贝数,并能够在回归框架中同时测试所有感兴趣变体的关联。该综合模型可以解释拷贝数赋值的不确定性,并根据后验概率区分变异是新生的还是遗传的。对于家庭研究,该模型可以适应家庭成员之间和重复 CNV 数据之间的相关性。此外,该模型可以假设孟德尔定律,同时仍然可以包含和直接量化每个个体的遗传变异,包括新生和遗传变异。最后,模拟研究表明,该模型在检测新生突变时具有高真阳性和低假阳性率。