Suppr超能文献

一种用于通过NGS数据识别多效性全局结构的二次正则化功能典型相关分析。

A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data.

作者信息

Lin Nan, Zhu Yun, Fan Ruzong, Xiong Momiao

机构信息

Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America.

Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, United States of America.

出版信息

PLoS Comput Biol. 2017 Oct 17;13(10):e1005788. doi: 10.1371/journal.pcbi.1005788. eCollection 2017 Oct.

Abstract

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.

摘要

研究基因变异的多效性作用可以提高统计效能,为深入理解疾病复杂的遗传结构提供重要信息,并为设计副作用更少的有效治疗方法提供有力工具。然而,当前的多表型关联分析范式缺乏广度(同时联合分析的表型和基因变异数量)和深度(表型和基因型的层次结构)。高维多效性分析的一个关键问题是从高维基因型和表型数据中有效地提取信息丰富的内部表示和特征。为了探索基因变异的相关信息,有效降低数据维度,并克服推进基因多效性分析新统计方法和计算算法发展中的关键障碍,我们提出了一种新的统计方法,称为二次正则化函数典型相关分析(QRFCCA)用于关联分析,该方法结合了三种方法:(1)二次正则化矩阵分解,(2)函数数据分析和(3)典型相关分析(CCA)。大规模模拟表明,QRFCCA在保持适当的一类错误率的同时,比十种竞争统计方法具有更高的效能。为了进一步评估性能,将QRFCCA和其他十种统计方法应用于来自英国双胞胎研究的全基因组测序数据集。我们使用QRFCCA共鉴定出79个具有罕见变异的基因和67个具有常见变异且与46个性状显著相关的基因。结果表明,QRFCCA明显优于其他十种统计方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07bc/5659802/de74c96f3c8d/pcbi.1005788.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验