Yang Zheng Rong, Thomson Rebecca
School of Engineering and Computer Science, Exeter University, Exeter EX4 4QF, UK.
IEEE Trans Neural Netw. 2005 Jan;16(1):263-74. doi: 10.1109/TNN.2004.836196.
The prediction of protease cleavage sites in proteins is critical to effective drug design. One of the important issues in constructing an accurate and efficient predictor is how to present nonnumerical amino acids to a model effectively. As this issue has not yet been paid full attention and is closely related to model efficiency and accuracy, we present a novel neural learning algorithm aimed at improving the prediction accuracy and reducing the time involved in training. The algorithm is developed based on the conventional radial basis function neural networks (RBFNNs) and is referred to as a bio-basis function neural network (BBFNN). The basic principle is to replace the radial basis function used in RBFNNs by a novel bio-basis function. Each bio-basis is a feature dimension in a numerical feature space, to which a nonnumerical sequence space is mapped for analysis. The bio-basis function is designed using an amino acid mutation matrix verified in biology. Thus, the biological content in protein sequences can be maximally utilized for accurate modeling. Mutual information (MI) is used to select the most informative bio-bases and an ensemble method is used to enhance a decision-making process, hence, improving the prediction accuracy further. The algorithm has been successfully verified in two case studies, namely the prediction of Human Immunodeficiency Virus (HIV) protease cleavage sites and trypsin cleavage sites in proteins.
蛋白质中蛋白酶切割位点的预测对于有效的药物设计至关重要。构建准确且高效的预测器的重要问题之一是如何有效地将非数值型氨基酸呈现给模型。由于该问题尚未得到充分关注且与模型效率和准确性密切相关,我们提出了一种旨在提高预测准确性并减少训练时间的新型神经学习算法。该算法基于传统的径向基函数神经网络(RBFNNs)开发,被称为生物基函数神经网络(BBFNN)。其基本原理是用一种新型生物基函数取代RBFNNs中使用的径向基函数。每个生物基都是数值特征空间中的一个特征维度,非数值序列空间被映射到该维度进行分析。生物基函数使用经生物学验证的氨基酸突变矩阵设计。因此,蛋白质序列中的生物学内容可被最大程度地用于准确建模。互信息(MI)用于选择信息最丰富的生物基,并且使用集成方法来增强决策过程,从而进一步提高预测准确性。该算法已在两个案例研究中成功得到验证,即预测人类免疫缺陷病毒(HIV)蛋白酶切割位点和蛋白质中的胰蛋白酶切割位点。