使用多项逻辑回归和人工神经网络对β-转角类型进行分析与识别。

Analysis and identification of beta-turn types using multinomial logistic regression and artificial neural network.

作者信息

Asgary Mehdi Poursheikhali, Jahandideh Samad, Abdolmaleki Parviz, Kazemnejad Anoshirvan

机构信息

Department of Biophysics, Faculty of Basic Sciences, Tarbiat Modares University, Tehran, Iran.

出版信息

Bioinformatics. 2007 Dec 1;23(23):3125-30. doi: 10.1093/bioinformatics/btm324. Epub 2007 Jun 28.

DOI:10.1093/bioinformatics/btm324

PMID:17599929

Abstract

MOTIVATION

So far various statistical and machine learning techniques applied for prediction of beta-turns. The majority of these techniques have been only focused on the prediction of beta-turn location in proteins. We developed a hybrid approach for analysis and prediction of different types of beta-turn.

RESULTS

A two-stage hybrid model developed to predict the beta-turn Types I, II, IV and VIII. Multinomial logistic regression was initially used for the first time to select significant parameters in prediction of beta-turn types using a self-consistency test procedure. The extracted parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in beta-turn sequence. The most significant parameters were then selected using multinomial logistic regression model. Among these, the occurrences of glutamine, histidine, glutamic acid and arginine, respectively, in positions i, i + 1, i + 2 and i + 3 of beta-turn sequence had an overall relationship with five beta-turn types. A neural network model was then constructed and fed by the parameters selected by multinomial logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains by 9-fold cross-validation. It has been observed that the hybrid model gives a Matthews correlation coefficient (MCC) of 0.235, 0.473, 0.103 and 0.124, respectively, for beta-turn Types I, II, IV and VIII. Our model also distinguished the different types of beta-turn in the embedded binary logit comparisons which have not carried out so far.

AVAILABILITY

Available on request from the authors.

摘要

动机

到目前为止，各种统计和机器学习技术已应用于β-转角的预测。这些技术大多仅专注于蛋白质中β-转角位置的预测。我们开发了一种混合方法来分析和预测不同类型的β-转角。

结果

开发了一种两阶段混合模型来预测I型、II型、IV型和VIII型β-转角。首次使用多项逻辑回归，通过自一致性测试程序在β-转角类型预测中选择重要参数。提取的参数包括β-转角序列中80个氨基酸位置出现情况和20个氨基酸百分比。然后使用多项逻辑回归模型选择最显著的参数。其中，β-转角序列第i、i + 1、i + 2和i + 3位分别出现的谷氨酰胺、组氨酸、谷氨酸和精氨酸与五种β-转角类型总体相关。然后构建一个神经网络模型，并输入由多项逻辑回归选择的参数来构建一个混合预测器。该网络已通过9折交叉验证在565个蛋白质链的非同源数据集上进行训练和测试。据观察，对于I型、II型、IV型和VIII型β-转角，混合模型的马修斯相关系数（MCC）分别为0.235、0.473、0.103和0.124。我们的模型还在迄今尚未进行的嵌入式二元逻辑比较中区分了不同类型的β-转角。