Suppr超能文献

FunSPU:一种基于多功能注释的全基因组测序数据关联测试的通用和自适应方法。

FunSPU: A versatile and adaptive multiple functional annotation-based association test of whole-genome sequencing data.

机构信息

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America.

Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center, Houston, Texas, United States of America.

出版信息

PLoS Genet. 2019 Apr 29;15(4):e1008081. doi: 10.1371/journal.pgen.1008081. eCollection 2019 Apr.

Abstract

Despite ongoing large-scale population-based whole-genome sequencing (WGS) projects such as the NIH NHLBI TOPMed program and the NHGRI Genome Sequencing Program, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, may be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. Our knowledge for selecting informative annotations a priori is limited, and incorporating non-informative annotations will introduce noise and lose power. We propose FunSPU, a versatile and adaptive test that incorporates multiple biological annotations and is adaptive at both the annotation and variant levels and thus maintains high power even in the presence of noninformative annotations. In addition to extensive simulations, we illustrate our proposed test using the TWINSUK cohort (n = 1,752) of UK10K WGS data based on six functional annotations: CADD, RegulomeDB, FunSeq, Funseq2, GERP++, and GenoSkyline. We identified genome-wide significant genetic loci on chromosome 19 near gene TOMM40 and APOC4-APOC2 associated with low-density lipoprotein (LDL), which are replicated in the UK10K ALSPAC cohort (n = 1,497). These replicated LDL-associated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge. We have implemented the proposed test in an R package "FunSPU".

摘要

尽管正在进行大规模的基于人群的全基因组测序(WGS)项目,如 NIH NHLBI TOPMed 计划和 NHGRI 基因组测序计划,但由于大量的罕见变异,其中许多是非表型相关的中性变异,基于 WGS 的复杂性状关联分析仍然是一个巨大的挑战。外部生物学知识,如基于 ENCODE、表观基因组学路线图和 GTEx 项目的功能注释,可能有助于区分因果罕见变异和中性变异;然而,每种功能注释只能提供生物学功能的某些方面。我们事先选择信息丰富的注释的知识是有限的,并且纳入非信息丰富的注释将引入噪声并降低功效。我们提出了 FunSPU,这是一种多功能且自适应的测试,它结合了多种生物学注释,并且在注释和变体级别上都是自适应的,因此即使存在非信息注释,也能保持高功效。除了广泛的模拟,我们还使用基于 UK10K WGS 数据的 TWINSUK 队列(n=1752)的六个功能注释(CADD、RegulomeDB、FunSeq、Funseq2、GERP++和 GenoSkyline)来说明我们提出的测试。我们在染色体 19 上靠近 TOMM40 和 APOC4-APOC2 基因的区域确定了与低密度脂蛋白(LDL)相关的全基因组显著遗传位点,这些位点在 UK10K ALSPAC 队列(n=1497)中得到了复制。这些已复制的 LDL 相关位点被现有的罕见变异关联测试忽略,这些测试要么忽略外部生物学信息,要么依赖单一来源的生物学知识。我们已经在 R 包“FunSPU”中实现了所提出的测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec9e/6508749/be2b25aeef8a/pgen.1008081.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验