Suppr超能文献

利用深度神经网络实现近乎完美的蛋白质多标签分类。

Near perfect protein multi-label classification with deep neural networks.

机构信息

PIT Bioinformatics Group, Eötvös University, H-1117 Budapest, Hungary.

PIT Bioinformatics Group, Eötvös University, H-1117 Budapest, Hungary; Uratim Ltd., H-1118 Budapest, Hungary.

出版信息

Methods. 2018 Jan 1;132:50-56. doi: 10.1016/j.ymeth.2017.06.034. Epub 2017 Jul 3.

Abstract

Biological sequences can be considered as data items of high-, non-fixed dimensions, corresponding to the length of those sequences. The comparison and the classification of biological sequences in their relations to large databases are important areas of research today. Artificial neural networks (ANNs) have gained a well-deserved popularity among machine learning tools upon their recent successful applications in image- and sound processing and classification problems. ANNs have also been applied for predicting the family or function of a protein, knowing its residue sequence. Here we present two new ANNs with multi-label classification ability, showing impressive accuracy when classifying protein sequences into 698 UniProt families (AUC=99.99%) and 983 Gene Ontology classes (AUC=99.45%).

摘要

生物序列可以被视为具有高维度、非固定维度的数据项,对应于序列的长度。在将生物序列与其大型数据库的关系进行比较和分类方面,这是当今的重要研究领域。人工神经网络 (ANN) 在最近成功应用于图像处理和声音处理以及分类问题之后,在机器学习工具中获得了当之无愧的普及。ANN 也已被用于预测蛋白质的家族或功能,只需知道其残基序列。在这里,我们提出了两种具有多标签分类能力的新 ANN,在将蛋白质序列分类为 698 个 UniProt 家族(AUC=99.99%)和 983 个 Gene Ontology 类(AUC=99.45%)时,表现出令人印象深刻的准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验