Suppr超能文献

特征选择可以改进用于生物信息学问题的深度神经网络。

Feature selection may improve deep neural networks for the bioinformatics problems.

机构信息

BioKnow Health Informatics Lab, College of Computer Science and Technology.

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.

出版信息

Bioinformatics. 2020 Mar 1;36(5):1542-1552. doi: 10.1093/bioinformatics/btz763.

Abstract

MOTIVATION

Deep neural network (DNN) algorithms were utilized in predicting various biomedical phenotypes recently, and demonstrated very good prediction performances without selecting features. This study proposed a hypothesis that the DNN models may be further improved by feature selection algorithms.

RESULTS

A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on three conventional DNN algorithms, i.e. convolution neural network (CNN), deep belief network (DBN) and recurrent neural network (RNN), and three recent DNNs, i.e. MobilenetV2, ShufflenetV2 and Squeezenet. Five binary classification methylomic datasets were chosen to calculate the prediction performances of CNN/DBN/RNN models using feature selected by the 11 feature selection algorithms. Seventeen binary classification transcriptome and two multi-class transcriptome datasets were also utilized to evaluate how the hypothesis may generalize to different data types. The experimental data supported our hypothesis that feature selection algorithms may improve DNN models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies on the five methylomic datasets.

AVAILABILITY AND IMPLEMENTATION

All the algorithms were implemented and tested under the programming environment Python version 3.6.6.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

最近,深度神经网络 (DNN) 算法被用于预测各种生物医学表型,并且在不进行特征选择的情况下表现出了非常好的预测性能。本研究提出了一个假设,即通过特征选择算法可以进一步改进 DNN 模型。

结果

通过在三种传统 DNN 算法(卷积神经网络 (CNN)、深度置信网络 (DBN) 和循环神经网络 (RNN))和三种最新的 DNN 算法(MobilenetV2、ShufflenetV2 和 Squeezenet)上评估 11 种特征选择算法,进行了全面的比较研究。选择了五个二分类甲基化组数据集,使用 11 种特征选择算法选择的特征来计算 CNN/DBN/RNN 模型的预测性能。还利用了十七个二分类转录组和两个多类转录组数据集,评估了该假设如何推广到不同的数据类型。实验数据支持了我们的假设,即特征选择算法可以改进 DNN 模型,并且使用 SVM-RFE 选择特征的 DBN 模型通常在五个甲基化组数据集上获得了最佳的预测准确性。

可用性和实现

所有算法都在编程环境 Python 版本 3.6.6 下实现和测试。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验