Suppr超能文献

基于临床样本的网络进行疾病基因综合预测。

Ensemble disease gene prediction by clinical sample-based networks.

机构信息

Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada.

School of Information, Beijing Wuzi University, Beijing, 101149, China.

出版信息

BMC Bioinformatics. 2020 Mar 11;21(Suppl 2):79. doi: 10.1186/s12859-020-3346-8.

Abstract

BACKGROUND

Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific function, biomolecular networks, such as the protein-protein interaction (PPI) network and gene co-expression networks, are widely used to predict disease genes by analyzing the relationships between known disease genes and other genes in the networks. However, existing methods commonly use a universal static PPI network, which ignore the fact that PPIs are dynamic, and PPIs in various patients should also be different.

RESULTS

To address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). The algorithm first constructs single sample-based networks for each case sample of the disease under study. Then, these single sample-based networks are merged to several fused networks based on the clustering results of the samples. After that, logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. EdgCSN is evaluated on breast cancer (BC), thyroid cancer (TC) and Alzheimer's disease (AD) and obtains AUC values of 0.970, 0.971 and 0.966, respectively, which are much better than the competing algorithms. Subsequent de novo validations also demonstrate the ability of EdgCSN in predicting new disease genes.

CONCLUSIONS

In this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Results of the leave-one-out cross validation show that our EdgCSN performs much better than the competing algorithms in predicting BC-associated, TC-associated and AD-associated genes. de novo validations also show that EdgCSN is valuable for identifying new disease genes.

摘要

背景

疾病基因预测是一项关键且具有挑战性的任务。许多计算方法已被开发出来用于预测疾病基因,这可以减少实验验证所花费的金钱和时间。由于蛋白质(基因的产物)通常协同工作以实现特定功能,因此生物分子网络,如蛋白质-蛋白质相互作用(PPI)网络和基因共表达网络,被广泛用于通过分析已知疾病基因与网络中的其他基因之间的关系来预测疾病基因。然而,现有的方法通常使用通用的静态 PPI 网络,忽略了 PPI 是动态的,并且不同患者的 PPI 也应该不同的事实。

结果

为了解决这些问题,我们开发了一种基于临床样本网络的疾病基因预测的集成算法(EdgCSN)。该算法首先为研究疾病的每个病例样本构建基于单个样本的网络。然后,根据样本的聚类结果,将这些基于单个样本的网络合并到几个融合网络中。之后,使用从融合网络中提取的中心性特征训练逻辑模型,并使用集成策略预测每个基因与疾病相关的最终概率。在乳腺癌(BC)、甲状腺癌(TC)和阿尔茨海默病(AD)上评估 EdgCSN,分别获得 0.970、0.971 和 0.966 的 AUC 值,明显优于竞争算法。随后的从头验证也证明了 EdgCSN 预测新疾病基因的能力。

结论

在这项研究中,我们提出了 EdgCSN,这是一种基于从临床样本网络中提取的中心性特征训练模型的用于预测疾病基因的集成学习算法。通过交叉验证的留一法验证结果表明,与竞争算法相比,我们的 EdgCSN 在预测 BC 相关、TC 相关和 AD 相关基因方面表现出色。从头验证也表明,EdgCSN 对于识别新的疾病基因是有价值的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93ec/7068856/b071c0f5757d/12859_2020_3346_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验