School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China.
School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China.
Comput Math Methods Med. 2020 Apr 3;2020:3974598. doi: 10.1155/2020/3974598. eCollection 2020.
The type III secretion system (T3SS) is a special protein delivery system in Gram-negative bacteria which delivers T3SS-secreted effectors (T3SEs) to host cells causing pathological changes. Numerous experiments have verified that T3SEs play important roles in many biological activities and in host-pathogen interactions. Accurate identification of T3SEs is therefore essential to help understand the pathogenic mechanism of bacteria; however, many existing biological experimental methods are time-consuming and expensive. New deep-learning methods have recently been successfully applied to T3SE recognition, but improving the recognition accuracy of T3SEs is still a challenge. In this study, we developed a new deep-learning framework, ACNNT3, based on the attention mechanism. We converted 100 residues of the N-terminal of the protein sequence into a fusion feature vector of protein primary structure information (one-hot encoding) and position-specific scoring matrix (PSSM) which are used as the feature input of the network model. We then embedded the attention layer into CNN to learn the characteristic preferences of type III effector proteins, which can accurately classify any protein directly as either T3SEs or non-T3SEs. We found that the introduction of new protein features can improve the recognition accuracy of the model. Our method combines the advantages of CNN and the attention mechanism and is superior in many indicators when compared to other popular methods. Using the common independent dataset, our method is more accurate than the previous method, showing an improvement of 4.1-20.0%.
III 型分泌系统(T3SS)是革兰氏阴性菌中的一种特殊蛋白输送系统,它将 T3SS 分泌的效应器(T3SEs)输送到宿主细胞,导致病理变化。大量实验已经验证,T3SEs 在许多生物活性和宿主-病原体相互作用中发挥重要作用。因此,准确识别 T3SEs 对于帮助理解细菌的致病机制至关重要;然而,许多现有的生物学实验方法既耗时又昂贵。新的深度学习方法最近已成功应用于 T3SE 识别,但提高 T3SE 的识别准确性仍然是一个挑战。在这项研究中,我们开发了一种新的基于注意力机制的深度学习框架 ACNNT3。我们将蛋白质序列 N 端的 100 个残基转换为蛋白质一级结构信息(独热编码)和位置特异性评分矩阵(PSSM)的融合特征向量,作为网络模型的特征输入。然后,我们将注意力层嵌入到 CNN 中,以学习 III 型效应蛋白的特征偏好,从而可以直接准确地将任何蛋白质分类为 T3SEs 或非 T3SEs。我们发现,引入新的蛋白质特征可以提高模型的识别准确性。我们的方法结合了 CNN 和注意力机制的优势,在许多指标上优于其他流行方法。使用常见的独立数据集,我们的方法比以前的方法更准确,准确率提高了 4.1%至 20.0%。