Department of Molecular Biology, Institute of Biochemistry, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland.
College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, 02-097 Warsaw, Poland.
Int J Mol Sci. 2022 Mar 9;23(6):2966. doi: 10.3390/ijms23062966.
For decades, the rate of solving new biomolecular structures has been exceeding that at which their manual classification and feature characterisation can be carried out efficiently. Therefore, a new comprehensive and holistic tool for their examination is needed.
Here we propose the Biological Sequence and Structure Network (BioS2Net), which is a novel deep neural network architecture that extracts both sequential and structural information of biomolecules. Our architecture consists of four main parts: (i) a sequence convolutional extractor, (ii) a 3D structure extractor, (iii) a 3D structure-aware sequence temporal network, as well as (iv) a fusion and classification network.
We have evaluated our approach using two protein fold classification datasets. BioS2Net achieved a 95.4% mean class accuracy on the eDD dataset and a 76% mean class accuracy on the F184 dataset. The accuracy of BioS2Net obtained on the eDD dataset was comparable to results achieved by previously published methods, confirming that the algorithm described in this article is a top-class solution for protein fold recognition.
BioS2Net is a novel tool for the holistic examination of biomolecules of known structure and sequence. It is a reliable tool for protein analysis and their unified representation as feature vectors.
几十年来,解决新生物分子结构的速度一直超过对其进行有效分类和特征描述的速度。因此,需要一种新的全面综合的工具来对其进行检查。
在这里,我们提出了生物序列和结构网络(BioS2Net),这是一种新的深度神经网络架构,可提取生物分子的序列和结构信息。我们的架构由四个主要部分组成:(i)序列卷积提取器,(ii)3D 结构提取器,(iii)3D 结构感知序列时间网络,以及(iv)融合和分类网络。
我们使用两个蛋白质折叠分类数据集评估了我们的方法。BioS2Net 在 eDD 数据集上的平均分类准确率为 95.4%,在 F184 数据集上的平均分类准确率为 76%。BioS2Net 在 eDD 数据集上获得的准确率可与先前发表的方法相媲美,这证实了本文中描述的算法是蛋白质折叠识别的顶级解决方案。
BioS2Net 是一种用于全面检查已知结构和序列的生物分子的新工具。它是一种可靠的蛋白质分析工具,可将其统一表示为特征向量。