Tragante Vinicius, Gho Johannes M I H, Felix Janine F, Vasan Ramachandran S, Smith Nicholas L, Voight Benjamin F, Palmer Colin, van der Harst Pim, Moore Jason H, Asselbergs Folkert W
Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands.
Department of Epidemiology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands.
BioData Min. 2017 May 26;10:18. doi: 10.1186/s13040-017-0137-5. eCollection 2017.
Genetic studies for complex diseases have predominantly discovered main effects at individual loci, but have not focused on genomic and environmental contexts important for a phenotype. Gene Set Enrichment Analysis (GSEA) aims to address this by identifying sets of genes or biological pathways contributing to a phenotype, through gene-gene interactions or other mechanisms, which are not the focus of conventional association methods.
Approaches that utilize GSEA can now take input from array chips, either gene-centric or genome-wide, but are highly sensitive to study design, SNP selection and pruning strategies, SNP-to-gene mapping, and pathway definitions. Here, we present lessons learned from our experience with GSEA of heart failure, a particularly challenging phenotype due to its underlying heterogeneous etiology.
This case study shows that proper data handling is essential to avoid false-positive results. Well-defined pipelines for quality control are needed to avoid reporting spurious results using GSEA.
针对复杂疾病的基因研究主要发现了单个基因座的主效应,但未关注对表型重要的基因组和环境背景。基因集富集分析(GSEA)旨在通过识别通过基因-基因相互作用或其他机制对表型有贡献的基因集或生物学途径来解决这一问题,而这些并非传统关联方法的重点。
利用GSEA的方法现在可以从阵列芯片获取输入数据,无论是以基因为中心还是全基因组范围的,但对研究设计、单核苷酸多态性(SNP)选择和修剪策略、SNP到基因的映射以及途径定义高度敏感。在此,我们介绍从心力衰竭GSEA经验中吸取的教训,心力衰竭由于其潜在的异质性病因是一种特别具有挑战性的表型。
本案例研究表明,正确的数据处理对于避免假阳性结果至关重要。需要定义明确的质量控制流程以避免使用GSEA报告虚假结果。