Shi Hongyan, Zhang Shengli
School of Mathematics and Statistics, Xidian University, Xi'an, 710071, People's Republic of China.
Interdiscip Sci. 2022 Dec;14(4):879-894. doi: 10.1007/s12539-022-00521-3. Epub 2022 Apr 27.
Hypertension (HT) is a general disease, and also one of the most ordinary and major causes of cardiovascular disease. Some diseases are caused by high blood pressure, including impairment of heart and kidney function, cerebral hemorrhage and myocardial infarction. Due to the limitations of laboratory methods, bioactive peptides for the treatment of HT need a long time to be identified. Therefore, it is of great immediate significance for the identification of anti-hypertensive peptides (AHTPs). With the prevalence of machine learning, it is suggested to use it as a supplementary method for AHTPs classification. Therefore, we develop a new model to identify AHTPs based on multiple features and deep learning. And the deep model is constructed by combining a convolutional neural network (CNN) and a gated recurrent unit (GRU). The unique convolution structure is used to reduce the feature dimension and running time. The data processed by CNN is input into the recurrent structure GRU, and important information is filtered out through the reset gate and update gate. Finally, the output layer adopts Sigmoid activation function. Firstly, we use Kmer, the deviation between the dipeptide frequency and the expected mean (DDE), encoding based on grouped weight (EBGW), enhanced grouped amino acid composition (EGAAC) and dipeptide binary profile and frequency (DBPF) to extract features. For Kmer, DDE, EBGW and EGAAC, it is widely used in the field of protein research. DBPF is a new feature representation method designed by us. It corresponds dipeptides to binary numbers, and finally obtains a binary coding file and a frequency file. Then these features are spliced together and input into our proposed model for prediction and analysis. After a tenfold cross-validation test, this model has a better competitive advantage than the previous methods, and the accuracy is 96.23% and 99.10%, respectively. From the results, compared with the previous methods, it has been greatly improved. It shows that the combination of convolution calculation and recurrent structure has a positive impact on the classification of AHTPs. The results show that this method is a feasible, efficient and competitive sequence analysis tool for AHTPs. Meanwhile, we design a friendly online prediction tool and it is freely accessible at http://ahtps.zhanglab.site/ .
高血压(HT)是一种常见疾病,也是心血管疾病最常见和主要的病因之一。一些疾病是由高血压引起的,包括心脏和肾脏功能损害、脑出血和心肌梗死。由于实验室方法的局限性,用于治疗高血压的生物活性肽需要很长时间才能被鉴定出来。因此,鉴定抗高血压肽(AHTPs)具有极其重要的现实意义。随着机器学习的普及,建议将其作为AHTPs分类的一种辅助方法。因此,我们基于多种特征和深度学习开发了一种新的模型来鉴定AHTPs。该深度模型由卷积神经网络(CNN)和门控循环单元(GRU)组合构建而成。独特的卷积结构用于降低特征维度和运行时间。经CNN处理的数据输入到循环结构GRU中,重要信息通过重置门和更新门被过滤掉。最后,输出层采用Sigmoid激活函数。首先,我们使用Kmer、二肽频率与预期均值之间的偏差(DDE)、基于分组权重的编码(EBGW)、增强型分组氨基酸组成(EGAAC)以及二肽二元轮廓和频率(DBPF)来提取特征。对于Kmer、DDE、EBGW和EGAAC,它们在蛋白质研究领域被广泛使用。DBPF是我们设计的一种新的特征表示方法。它将二肽与二进制数对应起来,最终得到一个二进制编码文件和一个频率文件。然后将这些特征拼接在一起,输入到我们提出的模型中进行预测和分析。经过十折交叉验证测试,该模型比之前的方法具有更好的竞争优势,准确率分别为96.23%和99.10%。从结果来看,与之前的方法相比有了很大的提高。这表明卷积计算和循环结构的结合对AHTPs的分类有积极影响。结果表明,该方法是一种可行、高效且具有竞争力的AHTPs序列分析工具。同时,我们设计了一个友好的在线预测工具,可通过http://ahtps.zhanglab.site/免费访问。