Suppr超能文献

SEAGLE:一种用于生物样本库数据中大规模基于集合的基因-环境相互作用测试的可扩展精确算法。

SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based Gene-Environment Interaction Tests in Biobank Data.

作者信息

Chi Jocelyn T, Ipsen Ilse C F, Hsiao Tzu-Hung, Lin Ching-Heng, Wang Li-San, Lee Wan-Ping, Lu Tzu-Pin, Tzeng Jung-Ying

机构信息

Department of Statistics, North Carolina State University, Raleigh, NC, United States.

Department of Mathematics, North Carolina State University, Raleigh, NC, United States.

出版信息

Front Genet. 2021 Nov 2;12:710055. doi: 10.3389/fgene.2021.710055. eCollection 2021.

Abstract

The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a calable xact lorithm for arge-scale set-based G× tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and -value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 10, is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.

摘要

生物样本库数据的激增为复杂疾病的基因-环境相互作用(GxE)研究提供了前所未有的机遇,这得益于其庞大的样本量以及丰富的遗传和非遗传信息收集。然而,极大的样本量也给G×E评估带来了新的计算挑战,特别是对于基于集合的G×E方差分量(VC)检验而言,该检验是一种广泛使用的策略,用于增强整体G×E信号并评估来自生物学上有意义的单元(例如基因)的多个变异的联合G×E效应。在这项工作中,我们专注于连续性状,并提出了SEAGLE,一种用于大规模基于集合的G×检验的可扩展精确算法,以允许对生物样本库规模的数据进行G×E VC检验。SEAGLE采用现代矩阵计算,以计算高效的方式计算GxE VC检验的检验统计量和p值,无需额外假设或依赖近似值。SEAGLE能够轻松容纳数量达10万级别的样本量,可在标准笔记本电脑上实现,且不需要专门的计算设备。我们通过广泛的模拟展示了SEAGLE的性能。我们通过对台湾生物样本库数据进行全基因组基于基因的G×E分析,以探索基因与身体活动状态对体重指数的相互作用,来说明其效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a96/8593472/e5130fb076f3/fgene-12-710055-g003.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验