• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多集合测试策略在应用于非常大的稀有变异集合时表现良好。

Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants.

作者信息

Fore Ruby, Boehme Jaden, Li Kevin, Westra Jason, Tintle Nathan

机构信息

Department of Biostatistics, Brown University, Providence, RI, United States.

Department of Mathematics, Oregon State University, Corvallis, OR, United States.

出版信息

Front Genet. 2020 Nov 9;11:591606. doi: 10.3389/fgene.2020.591606. eCollection 2020.

DOI:10.3389/fgene.2020.591606
PMID:33240333
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7680887/
Abstract

Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This "multi-set" approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype-phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.

摘要

基于基因的关联性检测(例如方差成分和负担检验)如今在试图阐明罕见基因变异对常见疾病影响的分析中已成为常规做法。随着测序数据集规模不断扩大,每个被检测集合(如基因)内的变异数量也在持续增加。基于通路的方法已被用于对基于基因的统计证据进行初步汇总,然后再对整个通路的证据进行后续汇总。这种“多集合”方法(先进行基于基因的检测,然后是基于通路的检测)在大规模测序数据集时代评估基因型与表型关联方面缺乏深入探索。特别是,我们想知道是否存在一些统计和生物学特征,使得多集合方法比单纯进行所有基于基因的检测更具优势?在本文中,我们提供了一个直观的框架来评估这些问题,并使用模拟数据来证实我们的这种直觉。我们还给出了一个实际数据应用示例,展示了我们的见解在实际中的体现。最终,我们发现当初始子集具有生物学信息时(例如,倾向于在一个或多个子集内汇总因果基因变异,通常是基因),多集合策略可以提高统计效力,在因果变异集中在总体变异较少的子集(子集中因果变异比例较高)的情况下尤其如此。然而,我们发现当这些集合没有信息时(子集中因果变异比例相似),优势并不明显。我们对实际数据的应用进一步证明了这种直觉。在实践中,我们建议更广泛地使用基于通路的方法,并根据复杂疾病遗传结构的新出现的生物学证据,进一步探索将变异汇总到子集中的最佳方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/b160e602f178/fgene-11-591606-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/218580ad60eb/fgene-11-591606-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/e2e40f619854/fgene-11-591606-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/1231c33698cb/fgene-11-591606-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/b160e602f178/fgene-11-591606-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/218580ad60eb/fgene-11-591606-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/e2e40f619854/fgene-11-591606-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/1231c33698cb/fgene-11-591606-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/babb/7680887/b160e602f178/fgene-11-591606-g004.jpg

相似文献

1
Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants.多集合测试策略在应用于非常大的稀有变异集合时表现良好。
Front Genet. 2020 Nov 9;11:591606. doi: 10.3389/fgene.2020.591606. eCollection 2020.
2
A multistep approach to single nucleotide polymorphism-set analysis: an evaluation of power and type I error of gene-based tests of association after pathway-based association tests.一种用于单核苷酸多态性集分析的多步骤方法:基于通路的关联测试后基于基因的关联测试的效能和I型错误评估。
BMC Proc. 2016 Oct 18;10(Suppl 7):349-355. doi: 10.1186/s12919-016-0055-4. eCollection 2016.
3
Evaluating methods for combining rare variant data in pathway-based tests of genetic association.评估在基于通路的基因关联测试中合并稀有变异数据的方法。
BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S48. doi: 10.1186/1753-6561-5-S9-S48.
4
A general approach for combining diverse rare variant association tests provides improved robustness across a wider range of genetic architectures.一种结合多种罕见变异关联测试的通用方法在更广泛的遗传结构范围内提供了更高的稳健性。
Eur J Hum Genet. 2016 May;24(5):767-73. doi: 10.1038/ejhg.2015.194. Epub 2015 Oct 28.
5
KNOWLEDGE DRIVEN BINNING AND PHEWAS ANALYSIS IN MARSHFIELD PERSONALIZED MEDICINE RESEARCH PROJECT USING BIOBIN.在马什菲尔德个性化医学研究项目中使用BioBin进行知识驱动的分箱和全表型组关联研究分析
Pac Symp Biocomput. 2016;21:249-60.
6
Utilizing mutual information for detecting rare and common variants associated with a categorical trait.利用互信息检测与分类性状相关的罕见和常见变异。
PeerJ. 2016 Jun 16;4:e2139. doi: 10.7717/peerj.2139. eCollection 2016.
7
A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits.复杂性状基因水平关联研究中固定效应模型与混合效应模型的比较研究
Genet Epidemiol. 2016 Dec;40(8):702-721. doi: 10.1002/gepi.21984. Epub 2016 Jul 4.
8
Comparison of gene-based rare variant association mapping methods for quantitative traits in a bovine population with complex familial relationships.基于基因的稀有变异关联定位方法在具有复杂家族关系的牛群数量性状中的比较。
Genet Sel Evol. 2016 Aug 17;48(1):60. doi: 10.1186/s12711-016-0238-5.
9
Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes.在存在缺失数据的情况下进行适当的条件分析:在大规模烟草使用表型的荟萃分析中的应用。
PLoS Genet. 2018 Jul 17;14(7):e1007452. doi: 10.1371/journal.pgen.1007452. eCollection 2018 Jul.
10
A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants.一种用于常见或罕见变异基因关联分析的基于通路的强大自适应检验。
Am J Hum Genet. 2015 Jul 2;97(1):86-98. doi: 10.1016/j.ajhg.2015.05.018. Epub 2015 Jun 25.

引用本文的文献

1
Rare variants: data types and analysis strategies.罕见变异:数据类型与分析策略
Ann Transl Med. 2021 Jun;9(12):961. doi: 10.21037/atm-21-1635.

本文引用的文献

1
An exome-wide rare variant analysis of Korean men identifies three novel genes predisposing to prostate cancer.韩国男性外显子组稀有变异分析确定了三个导致前列腺癌的新易感基因。
Sci Rep. 2019 Nov 20;9(1):17173. doi: 10.1038/s41598-019-53445-2.
2
Rare-variant collapsing analyses for complex traits: guidelines and applications.复杂性状的罕见变异合并分析:指南与应用。
Nat Rev Genet. 2019 Dec;20(12):747-759. doi: 10.1038/s41576-019-0177-4. Epub 2019 Oct 11.
3
Exome-Wide Rare Variant Analysis From the DiscovEHR Study Identifies Novel Candidate Predisposition Genes for Endometrial Cancer.
来自发现电子健康记录(DiscovEHR)研究的外显子组范围罕见变异分析确定了子宫内膜癌新的候选易感基因。
Front Oncol. 2019 Jul 5;9:574. doi: 10.3389/fonc.2019.00574. eCollection 2019.
4
A simple and accurate method to determine genomewide significance for association tests in sequencing studies.一种在测序研究中确定关联测试全基因组显著性的简单准确方法。
Genet Epidemiol. 2019 Jun;43(4):365-372. doi: 10.1002/gepi.22183. Epub 2019 Jan 8.
5
FastSKAT: Sequence kernel association tests for very large sets of markers.FastSKAT:针对大量标记集的序列核关联检验。
Genet Epidemiol. 2018 Sep;42(6):516-527. doi: 10.1002/gepi.22136. Epub 2018 Jun 22.
6
Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease.用于罕见变异关联分析的知识驱动分箱方法:在阿尔茨海默病神经影像生物标志物中的应用
BMC Med Inform Decis Mak. 2017 May 18;17(Suppl 1):61. doi: 10.1186/s12911-017-0454-0.
7
The impact of rare and low-frequency genetic variants in common disease.罕见和低频基因变异在常见疾病中的影响。
Genome Biol. 2017 Apr 27;18(1):77. doi: 10.1186/s13059-017-1212-4.
8
GPR120: a critical role in adipogenesis, inflammation, and energy metabolism in adipose tissue.GPR120:在脂肪生成、炎症及脂肪组织能量代谢中起关键作用。
Cell Mol Life Sci. 2017 Aug;74(15):2723-2733. doi: 10.1007/s00018-017-2492-2. Epub 2017 Mar 11.
9
gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.gsSKAT:使用加权线性核进行罕见变异关联研究的快速基因集分析和多重检验校正。
Genet Epidemiol. 2017 May;41(4):297-308. doi: 10.1002/gepi.22036. Epub 2017 Feb 16.
10
A multistep approach to single nucleotide polymorphism-set analysis: an evaluation of power and type I error of gene-based tests of association after pathway-based association tests.一种用于单核苷酸多态性集分析的多步骤方法:基于通路的关联测试后基于基因的关联测试的效能和I型错误评估。
BMC Proc. 2016 Oct 18;10(Suppl 7):349-355. doi: 10.1186/s12919-016-0055-4. eCollection 2016.