基于可变长度基序检测和基于差异的分类对蛋白质亚细胞定位进行预测。

Prediction of protein subcellular localization based on variable-length motifs detection and dissimilarity based classification.

作者信息

Arango-Argoty G A, Jaramillo-Garzón J A, Röthlisberger S, Castellanos-Dominguez C G

机构信息

Signal Processing and Recognition Group, Universidad Nacionalde Colombia, Campus La Nubia, Magdalena, Colombia.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:945-8. doi: 10.1109/IEMBS.2011.6090213.

DOI:10.1109/IEMBS.2011.6090213

PMID:22254467

Abstract

Predict the function of unknown proteins is one of the principal goals in computational biology. The subcellular localization of a protein allows further understanding its structure and molecular function. Numerous prediction techniques have been developed, usually focusing on global information of the protein. But, predictions can be done through the identification of functional sub-sequence patterns known as motifs. For motifs discovery problem, many methods requires a predefined fixed window size in advance and aligned sequences. To confront these problems we proposed a method based on variable length motifs characterization and detection using the continuous wavelet transform (CWT) and a dissimilarity space representation. For analyzing the motifs results generated by our approach, we divide the entire dataset into training (60%) and validation (40%). A Support Vector Machine (SVM) classifier is used as predictor for validation set. The highest Sn = 82.58% and Sp = 92.86%, across 10-fold cross validation, is obtained for endosome proteins. Average results Sn = 74% and Sp = 75.58% are comparable to current state of the art. For data sets whose identity is low (< 40%), the motifs characterization and localization based on CWT shows a good performance and the interpretability of the subsequences in each subcellular localization.

摘要

预测未知蛋白质的功能是计算生物学的主要目标之一。蛋白质的亚细胞定位有助于进一步了解其结构和分子功能。已经开发了许多预测技术，通常侧重于蛋白质的全局信息。但是，可以通过识别称为基序的功能性子序列模式来进行预测。对于基序发现问题，许多方法需要预先定义固定的窗口大小和比对序列。为了解决这些问题，我们提出了一种基于可变长度基序表征和检测的方法，该方法使用连续小波变换（CWT）和差异空间表示。为了分析我们的方法生成的基序结果，我们将整个数据集分为训练集（60%）和验证集（40%）。支持向量机（SVM）分类器用作验证集的预测器。对于内体蛋白，在10折交叉验证中获得的最高灵敏度（Sn）= 82.58%，特异度（Sp）= 92.86%。平均结果Sn = 74%，Sp = 75.58%与当前的技术水平相当。对于同一性较低（< 40%）的数据集，基于CWT的基序表征和定位显示出良好的性能以及每个亚细胞定位中后续序列的可解释性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于可变长度基序检测和基于差异的分类对蛋白质亚细胞定位进行预测。

Prediction of protein subcellular localization based on variable-length motifs detection and dissimilarity based classification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

基于可变长度基序检测和基于差异的分类对蛋白质亚细胞定位进行预测。

Prediction of protein subcellular localization based on variable-length motifs detection and dissimilarity based classification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献