Suppr超能文献

基于 DNN 和 Mashup 的衰老相关疾病基因预测。

Gene prediction of aging-related diseases based on DNN and Mashup.

机构信息

School of Information Science and Engineering, Yunnan University, KunMing, 650000, China.

出版信息

BMC Bioinformatics. 2021 Dec 17;22(1):597. doi: 10.1186/s12859-021-04518-5.

Abstract

BACKGROUND

At present, the bioinformatics research on the relationship between aging-related diseases and genes is mainly through the establishment of a machine learning multi-label model to classify each gene. Most of the existing methods for predicting pathogenic genes mainly rely on specific types of gene features, or directly encode multiple features with different dimensions, use the same encoder to concatenate and predict the final results, which will be subject to many limitations in the applicability of the algorithm. Possible shortcomings of the above include: incomplete coverage of gene features by a single type of biomics data, overfitting of small dimensional datasets by a single encoder, or underfitting of larger dimensional datasets.

METHODS

We use the known gene disease association data and gene descriptors, such as gene ontology terms (GO), protein interaction data (PPI), PathDIP, Kyoto Encyclopedia of genes and genomes Genes (KEGG), etc, as input for deep learning to predict the association between genes and diseases. Our innovation is to use Mashup algorithm to reduce the dimensionality of PPI, GO and other large biological networks, and add new pathway data in KEGG database, and then combine a variety of biological information sources through modular Deep Neural Network (DNN) to predict the genes related to aging diseases.

RESULT AND CONCLUSION

The results show that our algorithm is more effective than the standard neural network algorithm (the Area Under the ROC curve from 0.8795 to 0.9153), gradient enhanced tree classifier and logistic regression classifier. In this paper, we firstly use DNN to learn the similar genes associated with the known diseases from the complex multi-dimensional feature space, and then provide the evidence that the assumed genes are associated with a certain disease.

摘要

背景

目前,衰老相关疾病与基因的生物信息学研究主要是通过建立机器学习多标签模型来对每个基因进行分类。现有的预测致病基因的方法大多依赖于特定类型的基因特征,或者直接用不同维度的多个特征进行编码,用同一个编码器进行拼接和预测最终结果,这将受到算法适用性的许多限制。上述方法可能存在的缺点包括:单一类型的生物信息数据对基因特征的覆盖不完整,单一编码器对小维度数据集的过拟合,或大维度数据集的欠拟合。

方法

我们使用已知的基因疾病关联数据和基因描述符,如基因本体论术语(GO)、蛋白质相互作用数据(PPI)、PathDIP、京都基因与基因组百科全书基因(KEGG)等,作为深度学习的输入来预测基因与疾病之间的关联。我们的创新之处在于使用 Mashup 算法来降低 PPI、GO 等大型生物网络的维度,并添加 KEGG 数据库中的新途径数据,然后通过模块化深度神经网络(DNN)结合多种生物信息源来预测与衰老疾病相关的基因。

结果与结论

结果表明,我们的算法比标准神经网络算法(ROC 曲线下面积从 0.8795 提高到 0.9153)、梯度增强树分类器和逻辑回归分类器更有效。本文首次使用 DNN 从复杂的多维特征空间中学习与已知疾病相关的相似基因,然后提供假设基因与某种疾病相关的证据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/343f/8680025/41faf7d65347/12859_2021_4518_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验