Suppr超能文献

基于集成深度神经网络的必需基因识别计算框架。

A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification.

机构信息

Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan.

Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan.

出版信息

Int J Mol Sci. 2020 Nov 28;21(23):9070. doi: 10.3390/ijms21239070.

Abstract

Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.

摘要

必需基因包含基因组的关键信息,这些信息可能是全面理解生命和进化的关键。由于它们的重要性,必需基因的研究被认为是计算生物学中的一个关键问题。识别必需基因的计算方法已经越来越流行,以降低传统实验的成本和时间消耗。已经有一些模型解决了这个问题,但由于高维特征和传统机器学习算法的使用,性能仍然不尽如人意。因此,需要创建一个新的模型,从 DNA 序列特征中提高这个问题的预测性能。本研究利用自然语言处理(NLP)模型通过将生物序列视为自然语言单词来学习生物序列。为了学习 NLP 特征,随后使用集成深度神经网络对监督学习模型进行了训练。我们提出的方法可以分别以 60.2%、84.6%、76.3%、0.449 和 0.814 的敏感性、特异性、准确性、马修斯相关系数(MCC)和接收者操作特征曲线(AUC)值识别必需基因。整体性能优于没有集成的单一模型,以及相同基准数据集上的最新预测器。这表明,该方法在确定必需基因方面是有效的,特别是在一般测序问题方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d804/7730808/e77a1470c8e9/ijms-21-09070-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验