Yu Qi-You, Lu Tzu-Pin, Hsiao Tzu-Hung, Lin Ching-Heng, Wu Chi-Yun, Tzeng Jung-Ying, Hsiao Chuhsing Kate
Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.
Department of Public Health, National Taiwan University, Taipei, Taiwan.
Front Genet. 2021 Sep 8;12:709555. doi: 10.3389/fgene.2021.709555. eCollection 2021.
Genomic studies have been a major approach to elucidating disease etiology and to exploring potential targets for treatments of many complex diseases. Statistical analyses in these studies often face the challenges of multiplicity, weak signals, and the nature of dependence among genetic markers. This situation becomes even more complicated when multi-omics data are available. To integrate the data from different platforms, various integrative analyses have been adopted, ranging from the direct union or intersection operation on sets derived from different single-platform analysis to complex hierarchical multi-level models. The former ignores the biological relationship between molecules while the latter can be hard to interpret. We propose in this study an integrative approach that combines both single nucleotide variants (SNVs) and copy number variations (CNVs) in the same genomic unit to co-localize the concurrent effect and to deal with the sparsity due to rare variants. This approach is illustrated with simulation studies to evaluate its performance and is applied to low-density lipoprotein cholesterol and triglyceride measurements from Taiwan Biobank. The results show that the proposed method can more effectively detect the collective effect from both SNVs and CNVs compared to traditional methods. For the biobank analysis, the identified genetic regions including the gene could be novel and deserve further investigation.
基因组研究一直是阐明疾病病因和探索许多复杂疾病治疗潜在靶点的主要方法。这些研究中的统计分析常常面临多重性、信号微弱以及遗传标记之间的依赖性质等挑战。当有多组学数据可用时,这种情况会变得更加复杂。为了整合来自不同平台的数据,人们采用了各种整合分析方法,从对不同单平台分析得出的集合进行直接并集或交集运算到复杂的分层多级模型。前者忽略了分子之间的生物学关系,而后者可能难以解释。在本研究中,我们提出了一种整合方法,该方法将同一基因组单元中的单核苷酸变异(SNV)和拷贝数变异(CNV)结合起来,以共定位并发效应并处理稀有变异导致的稀疏性。通过模拟研究对该方法进行了说明,以评估其性能,并将其应用于台湾生物银行的低密度脂蛋白胆固醇和甘油三酯测量。结果表明,与传统方法相比,所提出的方法能够更有效地检测SNV和CNV的共同效应。对于生物银行分析,所确定的包括该基因在内的遗传区域可能是新的,值得进一步研究。