GenEpi：基于机器学习的基因上位性发现。

GenEpi: gene-based epistasis discovery using machine learning.

机构信息

Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, 10617, Taiwan.

Taiwan AI Labs, Taipei, 10351, Taiwan.

出版信息

BMC Bioinformatics. 2020 Feb 24;21(1):68. doi: 10.1186/s12859-020-3368-2.

DOI:10.1186/s12859-020-3368-2

PMID:32093643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7041299/

Abstract

BACKGROUND

Genome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).

RESULTS

In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.

CONCLUSIONS

The results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future.

摘要

背景

全基因组关联研究 (GWAS) 提供了一种强大的方法来识别遗传变异与表型之间的关联。然而，用于检测遗传变异与表型之间相互作用的上位性的 GWAS 技术仍然有限。我们相信，开发一种有效和高效的 GWAS 方法来检测上位性将是发现复杂发病机制的关键，这对于阿尔茨海默病 (AD) 等复杂疾病尤为重要。

结果

在这方面，本研究提出了 GenEpi，这是一种通过提出的机器学习方法发现与表型相关的上位性的计算包。GenEpi 通过两阶段建模工作流程识别基因内和基因间的上位性。在两个阶段中，GenEpi 在生成特征时采用二元组合编码，并通过具有稳定性选择的 L1-正则化回归构建预测模型。模拟数据表明，GenEpi 在检测真实上位性方面优于其他广泛使用的方法。就实际数据而言，本研究以 AD 为例，揭示了 GenEpi 发现与疾病相关的变异和具有生物学意义和预测能力的变异相互作用的能力。

结论

模拟数据和 AD 的结果表明，GenEpi 具有有效和高效地检测与表型相关的上位性的能力。即将发布的软件包可以广泛推广，以便在不久的将来极大地促进许多复杂疾病的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ae/7041299/920fb297bc27/12859_2020_3368_Fig1_HTML.jpg

相似文献

GenEpi: gene-based epistasis discovery using machine learning.GenEpi：基于机器学习的基因上位性发现。

BMC Bioinformatics. 2020 Feb 24;21(1):68. doi: 10.1186/s12859-020-3368-2.

Protocol for Epistasis Detection with Machine Learning Using GenEpi Package.基于 GenEpi 包的机器学习连锁分析检测方案

Methods Mol Biol. 2021;2212:291-305. doi: 10.1007/978-1-0716-0947-7_18.

Novel Alzheimer's disease genes and epistasis identified using machine learning GWAS platform.利用机器学习 GWAS 平台鉴定新型阿尔茨海默病基因及其上位性。

Sci Rep. 2023 Oct 17;13(1):17662. doi: 10.1038/s41598-023-44378-y.

A whole-genome simulator capable of modeling high-order epistasis for complex disease.一种能够对复杂疾病进行高阶上位性建模的全基因组模拟器。

Genet Epidemiol. 2013 Nov;37(7):686-94. doi: 10.1002/gepi.21761. Epub 2013 Oct 1.

Revisiting genome-wide association studies from statistical modelling to machine learning.从统计建模到机器学习重新审视全基因组关联研究。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa263.

Performance of epistasis detection methods in semi-simulated GWAS.连锁不平衡检测方法在半模拟 GWAS 中的性能。

BMC Bioinformatics. 2018 Jun 18;19(1):231. doi: 10.1186/s12859-018-2229-8.

WISH-R- a fast and efficient tool for construction of epistatic networks for complex traits and diseases.WISH-R——一种用于构建复杂性状和疾病上位网络的快速有效的工具。

BMC Bioinformatics. 2018 Jul 31;19(1):277. doi: 10.1186/s12859-018-2291-2.

A novel approach for multi-SNP GWAS and its application in Alzheimer's disease.一种用于多单核苷酸多态性全基因组关联研究的新方法及其在阿尔茨海默病中的应用。

BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):268. doi: 10.1186/s12859-016-1093-7.

Brief Survey on Machine Learning in Epistasis.上位效应中机器学习的简要综述

Methods Mol Biol. 2021;2212:169-179. doi: 10.1007/978-1-0716-0947-7_11.

High-throughput analysis of epistasis in genome-wide association studies with BiForce.利用 BiForce 进行全基因组关联研究中的上位性的高通量分析。

Bioinformatics. 2012 Aug 1;28(15):1957-64. doi: 10.1093/bioinformatics/bts304. Epub 2012 May 21.

引用本文的文献

Intervention of machine learning in bladder cancer research using multi-omics datasets: systematic review on biomarker identification.利用多组学数据集的机器学习在膀胱癌研究中的干预：生物标志物识别的系统评价

Discov Oncol. 2025 Jun 5;16(1):1010. doi: 10.1007/s12672-025-02734-6.

Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer's Disease Using Genomic Data.使用基因组数据对多发性硬化症和阿尔茨海默病进行分类的机器学习方法

Int J Mol Sci. 2025 Feb 27;26(5):2085. doi: 10.3390/ijms26052085.

Considerations in the search for epistasis.连锁不平衡分析中的考虑因素。

Genome Biol. 2024 Nov 19;25(1):296. doi: 10.1186/s13059-024-03427-z.

Using GWAS and Machine Learning to Identify and Predict Genetic Variants Associated with Foodborne Bacteria Phenotypic Traits.利用 GWAS 和机器学习识别和预测与食源性病原体表型特征相关的遗传变异。

Methods Mol Biol. 2025;2852:223-253. doi: 10.1007/978-1-0716-4100-2_16.

Epistasis and pleiotropy-induced variation for plant breeding.上位性和多效性引起的植物育种变异。

Plant Biotechnol J. 2024 Oct;22(10):2788-2807. doi: 10.1111/pbi.14405. Epub 2024 Jun 14.

Genome-Wide Epistasis Study of Cerebrospinal Fluid Hyperphosphorylated Tau in ADNI Cohort.ADNI 队列中脑脊液过度磷酸化 tau 的全基因组上位性研究。

Genes (Basel). 2023 Jun 23;14(7):1322. doi: 10.3390/genes14071322.

Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies.利用性状间的遗传相关性可提高全基因组关联研究中上位性的检测能力。

G3 (Bethesda). 2023 Aug 9;13(8). doi: 10.1093/g3journal/jkad118.

Wide and deep learning based approaches for classification of Alzheimer's disease using genome-wide association studies.基于广泛和深度学习的方法，利用全基因组关联研究对阿尔茨海默病进行分类。

PLoS One. 2023 May 1;18(5):e0283712. doi: 10.1371/journal.pone.0283712. eCollection 2023.

Genetic risk factors for ME/CFS identified using combinatorial analysis.使用组合分析鉴定的 ME/CFS 遗传风险因素。

J Transl Med. 2022 Dec 14;20(1):598. doi: 10.1186/s12967-022-03815-8.

Genome-wide association study reveals ethnicity-specific SNPs associated with ankylosing spondylitis in the Taiwanese population.全基因组关联研究揭示了与台湾人群强直性脊柱炎相关的种族特异性单核苷酸多态性。

J Transl Med. 2022 Dec 12;20(1):589. doi: 10.1186/s12967-022-03701-3.

本文引用的文献

Relief-based feature selection: Introduction and review.基于缓解的特征选择：介绍与综述。

J Biomed Inform. 2018 Sep;85:189-203. doi: 10.1016/j.jbi.2018.07.014. Epub 2018 Jul 18.

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants.DisGeNET：一个整合人类疾病相关基因和变异信息的综合平台。

Nucleic Acids Res. 2017 Jan 4;45(D1):D833-D839. doi: 10.1093/nar/gkw943. Epub 2016 Oct 19.

Exhaustive Genome-Wide Search for SNP-SNP Interactions Across 10 Human Diseases.针对10种人类疾病进行全基因组范围内SNP-SNP相互作用的详尽搜索。

G3 (Bethesda). 2016 Jul 7;6(7):2043-50. doi: 10.1534/g3.116.028563.

Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease.阿尔茨海默病认知衰退和恢复力的众包评估

Alzheimers Dement. 2016 Jun;12(6):645-53. doi: 10.1016/j.jalz.2016.02.006. Epub 2016 Apr 11.

Gene expression parallels synaptic excitability and plasticity changes in Alzheimer's disease.基因表达与阿尔茨海默病中的突触兴奋性和可塑性变化平行。

Front Cell Neurosci. 2015 Aug 25;9:318. doi: 10.3389/fncel.2015.00318. eCollection 2015.

Second-generation PLINK: rising to the challenge of larger and richer datasets.第二代PLINK：应对更大、更丰富数据集的挑战

Gigascience. 2015 Feb 25;4:7. doi: 10.1186/s13742-015-0047-8. eCollection 2015.

The UCSC Genome Browser database: 2015 update.加州大学圣克鲁兹分校基因组浏览器数据库：2015年更新

Nucleic Acids Res. 2015 Jan;43(Database issue):D670-81. doi: 10.1093/nar/gku1177. Epub 2014 Nov 26.

Detecting epistasis in human complex traits.检测人类复杂性状中的上位性。

Nat Rev Genet. 2014 Nov;15(11):722-33. doi: 10.1038/nrg3747. Epub 2014 Sep 9.

Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease.对 74046 人的荟萃分析确定了 11 个阿尔茨海默病的新易感性位点。

Nat Genet. 2013 Dec;45(12):1452-8. doi: 10.1038/ng.2802. Epub 2013 Oct 27.

The Genotype-Tissue Expression (GTEx) project.基因型-组织表达 (GTEx) 项目。

Nat Genet. 2013 Jun;45(6):580-5. doi: 10.1038/ng.2653.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GenEpi：基于机器学习的基因上位性发现。

GenEpi: gene-based epistasis discovery using machine learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献