Center For Applied Statistics, School Of Statistics, And Statistical Consulting Center, Renmin University Of China, Beijing 100872, China.
School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China.
Biostatistics. 2022 Apr 13;23(2):574-590. doi: 10.1093/biostatistics/kxaa043.
In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited because they analyze each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as functional objects in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies demonstrate the satisfactory performance of the proposed method in the identification and estimation of disease-associated genetic variants, compared to four alternatives. The analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability.
在最近的生物医学研究中,全基因组关联研究(GWAS)在研究人类疾病的遗传结构方面取得了巨大成功。对于许多复杂疾病,已经收集了多个相关特征。然而,大多数现有的 GWAS 仍然受到限制,因为它们分别分析每个特征,而没有考虑它们的相关性,并且缺乏足够的信息。此外,单核苷酸多态性(SNP)数据的高维性在理论和实践方面都对统计方法提出了巨大的挑战。在本文中,我们创新性地提出了一种用于多特征 GWAS 的综合功能线性模型。这项研究首次将 SNP 近似为多特征联合模型中的功能对象,并采用惩罚技术。它有效地适应了 SNP 的高维性和多个特征之间的相关性,以促进信息借用。与四种替代方法相比,我们广泛的模拟研究表明,该方法在识别和估计与疾病相关的遗传变异方面具有令人满意的性能。对 2 型糖尿病数据的分析得出了具有良好预测准确性和选择稳定性的生物学有意义的发现。