基因-基因相互作用：维度灾难

Gene-gene interaction: the curse of dimensionality.

作者信息

Chattopadhyay Amrita, Lu Tzu-Pin

机构信息

Institute of Epidemiology and Preventive Medicine, Department of Public Health, National Taiwan University, Taipei.

出版信息

Ann Transl Med. 2019 Dec;7(24):813. doi: 10.21037/atm.2019.12.87.

DOI:10.21037/atm.2019.12.87

PMID:32042829

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6989881/

Abstract

Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.

摘要

全基因组关联研究中识别出的基因变异通常对疾病风险的影响较小，从而导致了“遗传力缺失”问题。解释这种“缺失”的一个途径是评估基因-基因相互作用（上位性），从而阐明它们对复杂疾病的影响。这可能有助于识别基因功能、通路和药物靶点。然而，对数以百万计的单核苷酸多态性（SNP）之间所有可能的基因相互作用进行详尽评估会引发几个问题，也就是所谓的“维度灾难”。对如此呈指数增长的SNP进行上位性分析所涉及的维度降低了传统参数统计方法的实用性。随着2001年提出的非参数方法——多因素降维（MDR）的广泛应用，该方法将多维度基因型分类为一维二元方法，导致了一系列基于MDR方法的快速发展。此外，近年来，诸如随机森林和神经网络（NN）等机器学习（ML）方法、深度学习（DL）方法以及混合方法也被大量应用于解决与全基因组基因-基因相互作用研究相关的维度问题。然而，基于MDR的方法中的详尽搜索或ML方法中的变量选择仍然存在遗漏相关SNP的风险。此外，可解释性问题是DL方法的主要障碍。为了尽量减少这种信息损失，基于Python的工具（如PySpark）可能会利用云中的分布式计算资源，带回较小的数据子集进行进一步的本地分析。并行计算可以成为对抗这种“灾难”的强大资源。PySpark支持所有标准的Python库和C扩展，因此便于编写代码，为超大型数据集显著提高处理速度。

相似文献

Gene-gene interaction: the curse of dimensionality.基因-基因相互作用：维度灾难

Ann Transl Med. 2019 Dec;7(24):813. doi: 10.21037/atm.2019.12.87.

A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype.用于检测基因-基因相互作用与生存表型的多因素降维方法的比较研究

Biomed Res Int. 2015;2015:671859. doi: 10.1155/2015/671859. Epub 2015 Aug 3.

A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions.一种基于统一模型的多因素降维框架用于检测基因-基因相互作用。

Bioinformatics. 2016 Sep 1;32(17):i605-i610. doi: 10.1093/bioinformatics/btw424.

A comparative study on the unified model based multifactor dimensionality reduction methods for identifying gene-gene interactions associated with the survival phenotype.基于统一模型的多因素降维方法识别与生存表型相关的基因-基因相互作用的比较研究。

BioData Min. 2021 Mar 1;14(1):17. doi: 10.1186/s13040-021-00248-9.

Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases.在常见人类疾病的群体遗传学研究中，基于网格的随机搜索用于分层基因-基因相互作用

BioData Min. 2017 May 30;10:19. doi: 10.1186/s13040-017-0139-3. eCollection 2017.

Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method.使用基于统一模型的多因素降维方法对加速失效时间模型进行基因-基因相互作用分析。

Genomics Inform. 2016 Dec;14(4):166-172. doi: 10.5808/GI.2016.14.4.166. Epub 2016 Dec 30.

Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype.基于统一Cox模型的多因素降维方法用于生存表型的基因-基因相互作用分析

BioData Min. 2018 Dec 14;11:27. doi: 10.1186/s13040-018-0189-1. eCollection 2018.

A roadmap to multifactor dimensionality reduction methods.多因素降维方法路线图。

Brief Bioinform. 2016 Mar;17(2):293-308. doi: 10.1093/bib/bbv038. Epub 2015 Jun 24.

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.

A comparison of multifactor dimensionality reduction and L1-penalized regression to identify gene-gene interactions in genetic association studies.在基因关联研究中比较多因素降维和L1惩罚回归以识别基因-基因相互作用

Stat Appl Genet Mol Biol. 2011;10(1):Article 4. doi: 10.2202/1544-6115.1613. Epub 2011 Jan 6.

引用本文的文献

Individualized co-expression-like index (iCKI) enables gene-gene interactions as individual biomarkers for complex disease.个体化共表达样指数（iCKI）能够将基因-基因相互作用作为复杂疾病的个体生物标志物。

BMC Bioinformatics. 2025 Aug 12;26(1):207. doi: 10.1186/s12859-025-06225-x.

Genetic Interactions of Phase II Xenobiotic-Metabolizing Enzymes and in Relation to Alcohol Abuse and Psoriasis Risk.II期异生物代谢酶的基因相互作用及其与酒精滥用和银屑病风险的关系。

J Xenobiot. 2025 Apr 20;15(2):60. doi: 10.3390/jox15020060.

Variance quantitative trait loci reveal gene-gene interactions which alter blood traits.方差数量性状基因座揭示了改变血液性状的基因-基因相互作用。

medRxiv. 2024 Sep 19:2024.09.18.24313883. doi: 10.1101/2024.09.18.24313883.

Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.生物医学文献挖掘：基于图核的基因-基因相互作用提取的学习方法。

Eur J Med Res. 2024 Aug 2;29(1):404. doi: 10.1186/s40001-024-01983-5.

Integrated multi-omics with machine learning to uncover the intricacies of kidney disease.运用整合多组学和机器学习技术揭示肾脏疾病的复杂性。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae364.

LUNGBANK: a novel biorepository strategy tailored for comprehensive multiomics analysis and P-medicine applications in lung cancer.肺库：一种专为肺癌的全面多组学分析和精准医学应用量身定制的新型生物样本库策略。

Turk J Biol. 2024 May 28;48(3):203-217. doi: 10.55730/1300-0152.2696. eCollection 2024.

Putative protective genomic variation in the Lithuanian population.立陶宛人群中假定的保护性基因组变异。

Genet Mol Biol. 2024 Apr 15;47(2):e20230030. doi: 10.1590/1678-4685-GMB-2023-0030. eCollection 2024.

17 variants interaction of Wnt/β-catenin pathway associated with development of osteonecrosis of femoral head in Chinese Han population.17 种变异与中国人股骨头坏死发病相关的 Wnt/β-catenin 通路相互作用。

Sci Rep. 2024 Mar 27;14(1):7301. doi: 10.1038/s41598-024-57929-8.

Searching for gene-gene interactions through variance quantitative trait loci of 29 continuous Taiwan Biobank phenotypes.通过台湾生物银行29种连续性表型的方差数量性状位点寻找基因-基因相互作用。

Front Genet. 2024 Mar 7;15:1357238. doi: 10.3389/fgene.2024.1357238. eCollection 2024.

Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management.走在游戏前面：SARS-CoV-2 如何加速机器学习在大流行管理中的应用。

BioDrugs. 2023 Sep;37(5):649-674. doi: 10.1007/s40259-023-00611-8. Epub 2023 Jul 18.

本文引用的文献

Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype.利用基因组数据结构设计深度神经网络，从基因型预测肌萎缩侧索硬化症。

Bioinformatics. 2019 Jul 15;35(14):i538-i547. doi: 10.1093/bioinformatics/btz369.

Finding the Sources of Missing Heritability within Rare Variants Through Simulation.通过模拟寻找罕见变异中缺失遗传力的来源。

Bioinform Biol Insights. 2017 Oct 4;11:1177932217735096. doi: 10.1177/1177932217735096. eCollection 2017.

Genetic Interaction Network as an Important Determinant of Gene Order in Genome Evolution.遗传相互作用网络是基因组进化中基因顺序的重要决定因素。

Mol Biol Evol. 2017 Dec 1;34(12):3254-3266. doi: 10.1093/molbev/msx264.

Genomics Inform. 2016 Dec;14(4):166-172. doi: 10.5808/GI.2016.14.4.166. Epub 2016 Dec 30.

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes.利用全基因组关联研究进行2型糖尿病风险预测

Genomics Inform. 2016 Dec;14(4):138-148. doi: 10.5808/GI.2016.14.4.138. Epub 2016 Dec 30.

A roadmap to multifactor dimensionality reduction methods.多因素降维方法路线图。

Brief Bioinform. 2016 Mar;17(2):293-308. doi: 10.1093/bib/bbv038. Epub 2015 Jun 24.

Exploiting SNP correlations within random forest for genome-wide association studies.在全基因组关联研究中利用随机森林内的单核苷酸多态性相关性

PLoS One. 2014 Apr 2;9(4):e93379. doi: 10.1371/journal.pone.0093379. eCollection 2014.

Summarizing techniques that combine three non-parametric scores to detect disease-associated 2-way SNP-SNP interactions.总结了三种非参数评分方法相结合的技术，用于检测与疾病相关的双向 SNP-SNP 相互作用。

Gene. 2014 Jan 1;533(1):304-12. doi: 10.1016/j.gene.2013.09.041. Epub 2013 Sep 25.

An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

Gene-gene interaction analysis for the survival phenotype based on the Cox model.基于 Cox 模型的生存表型的基因-基因交互作用分析。

Bioinformatics. 2012 Sep 15;28(18):i582-i588. doi: 10.1093/bioinformatics/bts415.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验