Human Genetics Unit, Indian Statistical Institute, Kolkata, India.
National Institute of Biomedical Genomics, Kalyani, India.
Genomics. 2019 Dec;111(6):1387-1394. doi: 10.1016/j.ygeno.2018.09.011. Epub 2018 Oct 1.
To decipher the genetic architecture of human disease, various types of omics data are generated. Two common omics data are genotypes and gene expression. Often genotype data for a large number of individuals and gene expression data for a few individuals are generated due to biological and technical reasons, leading to unequal sample sizes for different omics data. Unavailability of standard statistical procedure for integrating such datasets motivates us to propose a two-step multi-locus association method using latent variables. Our method is powerful than single/separate omics data analysis and it unravels comprehensively deep-seated signals through a single statistical model. Extensive simulation confirms that it is robust to various genetic models as its power increases with sample size and number of associated loci. It provides p-values very fast. Application to real dataset on psoriasis identifies 17 novel SNPs, functionally related to psoriasis-associated genes, at much smaller sample size than standard GWAS.
为了解读人类疾病的遗传结构,会生成各种类型的组学数据。两种常见的组学数据是基因型和基因表达。由于生物学和技术原因,通常会为大量个体生成基因型数据,而为少数个体生成基因表达数据,从而导致不同组学数据的样本量不等。由于缺乏用于整合此类数据集的标准统计程序,因此我们提出了一种两步多基因座关联方法,该方法使用潜在变量。我们的方法比单个/单独的组学数据分析更强大,并且通过单个统计模型全面揭示了深层次的信号。广泛的模拟证实,它对各种遗传模型具有鲁棒性,因为其功效随着样本量和相关基因座数量的增加而增加。它可以非常快速地提供 p 值。在银屑病的真实数据集上的应用表明,在比标准 GWAS 小得多的样本量下,确定了 17 个与银屑病相关基因功能相关的新 SNP。