通过整合 PPI 网络、临床 RNA-Seq 数据和 OMIM 数据进行疾病基因预测。

Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):222-232. doi: 10.1109/TCBB.2017.2770120. Epub 2017 Nov 7.

DOI:10.1109/TCBB.2017.2770120

Abstract

Disease gene prediction is a challenging task that has a variety of applications such as early diagnosis and drug development. The existing machine learning methods suffer from the imbalanced sample issue because the number of known disease genes (positive samples) is much less than that of unknown genes which are typically considered to be negative samples. In addition, most methods have not utilized clinical data from patients with a specific disease to predict disease genes. In this study, we propose a disease gene prediction algorithm (called dgSeq) by combining protein-protein interaction (PPI) network, clinical RNA-Seq data, and Online Mendelian Inheritance in Man (OMIN) data. Our dgSeq constructs differential networks based on rewiring information calculated from clinical RNA-Seq data. To select balanced sets of non-disease genes (negative samples), a disease-gene network is also constructed from OMIM data. After features are extracted from the PPI networks and differential networks, the logistic regression classifiers are trained. Our dgSeq obtains AUC values of 0.88, 0.83, and 0.80 for identifying breast cancer genes, thyroid cancer genes, and Alzheimer's disease genes, respectively, which indicates its superiority to other three competing methods. Both gene set enrichment analysis and predicted results demonstrate that dgSeq can effectively predict new disease genes.

摘要

疾病基因预测是一项具有挑战性的任务，具有多种应用，如早期诊断和药物开发。现有的机器学习方法存在样本不平衡的问题，因为已知疾病基因（阳性样本）的数量远远少于通常被认为是阴性样本的未知基因。此外，大多数方法没有利用特定疾病患者的临床数据来预测疾病基因。在这项研究中，我们通过结合蛋白质-蛋白质相互作用（PPI）网络、临床 RNA-Seq 数据和在线孟德尔遗传数据库（OMIN）数据，提出了一种疾病基因预测算法（称为 dgSeq）。我们的 dgSeq 基于从临床 RNA-Seq 数据计算的重连信息构建差异网络。为了选择平衡的非疾病基因（阴性样本）集，还从 OMIM 数据构建了疾病-基因网络。从 PPI 网络和差异网络中提取特征后，训练逻辑回归分类器。我们的 dgSeq 分别获得了 0.88、0.83 和 0.80 的 AUC 值，用于识别乳腺癌基因、甲状腺癌基因和阿尔茨海默病基因，表明其优于其他三种竞争方法。基因集富集分析和预测结果均表明，dgSeq 可以有效地预测新的疾病基因。

相似文献

Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data.通过整合 PPI 网络、临床 RNA-Seq 数据和 OMIM 数据进行疾病基因预测。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):222-232. doi: 10.1109/TCBB.2017.2770120. Epub 2017 Nov 7.

Ensemble disease gene prediction by clinical sample-based networks.基于临床样本的网络进行疾病基因综合预测。

BMC Bioinformatics. 2020 Mar 11;21(Suppl 2):79. doi: 10.1186/s12859-020-3346-8.

Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression.通过扩大种子集并融合网络拓扑结构和基因表达信息来对候选疾病基因进行优先级排序。

Mol Biosyst. 2014 Jun;10(6):1400-8. doi: 10.1039/c3mb70588a. Epub 2014 Apr 3.

DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE：基于深度学习的人类必需基因精准预测。

PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.

Identification of biomarkers associated with diagnosis and prognosis of colorectal cancer patients based on integrated bioinformatics analysis.基于整合生物信息学分析鉴定与结直肠癌患者诊断和预后相关的生物标志物。

Gene. 2019 Apr 15;692:119-125. doi: 10.1016/j.gene.2019.01.001. Epub 2019 Jan 14.

Predicting diabetes mellitus genes via protein-protein interaction and protein subcellular localization information.通过蛋白质-蛋白质相互作用和蛋白质亚细胞定位信息预测糖尿病基因。

BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):433. doi: 10.1186/s12864-016-2795-y.

Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.使用稳健的多网络模型整合组织特异性分子网络进行疾病基因优先级排序。

BMC Bioinformatics. 2016 Nov 10;17(1):453. doi: 10.1186/s12859-016-1317-x.

Ensemble decision of local similarity indices on the biological network for disease related gene prediction.基于生物网络局部相似性指标的集成决策进行疾病相关基因预测。

PeerJ. 2024 Sep 5;12:e17975. doi: 10.7717/peerj.17975. eCollection 2024.

Identification of molecular targets for esophageal carcinoma diagnosis using miRNA-seq and RNA-seq data from The Cancer Genome Atlas: a study of 187 cases.利用来自癌症基因组图谱的miRNA测序和RNA测序数据鉴定食管癌诊断的分子靶点：一项187例病例的研究

Oncotarget. 2017 May 30;8(22):35681-35699. doi: 10.18632/oncotarget.16051.

Prediction of microRNA-regulated protein interaction pathways in Arabidopsis using machine learning algorithms.利用机器学习算法预测拟南芥中 miRNA 调控的蛋白质互作通路。

Comput Biol Med. 2013 Nov;43(11):1645-52. doi: 10.1016/j.compbiomed.2013.08.010. Epub 2013 Aug 22.

引用本文的文献

GhostBuster: A Deep-Learning-based, Literature-Unbiased Gene Prioritization Tool for Gene Annotation Prediction.幽灵克星：一种基于深度学习、不受文献偏差影响的用于基因注释预测的基因优先级排序工具。

bioRxiv. 2025 Jun 27:2025.06.22.660948. doi: 10.1101/2025.06.22.660948.

DyNDG: Identifying Leukemia-related Genes Based on Time-series Dynamic Network by Integrating Differential Genes.DyNDG：通过整合差异基因基于时间序列动态网络识别白血病相关基因

Genomics Proteomics Bioinformatics. 2025 May 30;23(2). doi: 10.1093/gpbjnl/qzaf037.

Unraveling the Mysteries of Alzheimer's Disease Using Artificial Intelligence.利用人工智能揭开阿尔茨海默病之谜。

Rev Recent Clin Trials. 2025;20(2):124-141. doi: 10.2174/0115748871330861241030143321.

Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities.智能医疗保健中基于机器学习和深度学习的方法：最新进展、应用、挑战与机遇。

AIMS Public Health. 2024 Jan 5;11(1):58-109. doi: 10.3934/publichealth.2024004. eCollection 2024.

A Study and Analysis of Disease Identification using Genomic Sequence Processing Models: An Empirical Review.使用基因组序列处理模型进行疾病识别的研究与分析：实证综述

Curr Genomics. 2023 Dec 12;24(4):207-235. doi: 10.2174/0113892029269523231101051455.

Prioritization of risk genes for Alzheimer's disease: an analysis framework using spatial and temporal gene expression data in the human brain based on support vector machine.阿尔茨海默病风险基因的优先级排序：基于支持向量机的利用人类大脑时空基因表达数据的分析框架

Front Genet. 2023 Oct 6;14:1190863. doi: 10.3389/fgene.2023.1190863. eCollection 2023.

Potential protective effects of Huanglian Jiedu Decoction against COVID-19-associated acute kidney injury: A network-based pharmacological and molecular docking study.黄连解毒汤对新型冠状病毒肺炎相关急性肾损伤的潜在保护作用：基于网络的药理学和分子对接研究

Open Med (Wars). 2023 Jul 6;18(1):20230746. doi: 10.1515/med-2023-0746. eCollection 2023.

Predicting disease genes based on multi-head attention fusion.基于多头注意力融合的疾病基因预测。

BMC Bioinformatics. 2023 Apr 21;24(1):162. doi: 10.1186/s12859-023-05285-1.

NTD-DR: Nonnegative tensor decomposition for drug repositioning.NTD-DR：药物重定位的非负张量分解。

PLoS One. 2022 Jul 21;17(7):e0270852. doi: 10.1371/journal.pone.0270852. eCollection 2022.

The Road to Personalized Medicine in Alzheimer's Disease: The Use of Artificial Intelligence.阿尔茨海默病个性化医疗之路：人工智能的应用

Biomedicines. 2022 Jan 29;10(2):315. doi: 10.3390/biomedicines10020315.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过整合 PPI 网络、临床 RNA-Seq 数据和 OMIM 数据进行疾病基因预测。

Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献