Suppr超能文献

使用 SVM 和 NN 进行信号肽识别和切割位点鉴定。

Signal peptide discrimination and cleavage site identification using SVM and NN.

机构信息

London Metropolitan University, UK.

Portsmouth University, UK.

出版信息

Comput Biol Med. 2014 Feb;45:98-110. doi: 10.1016/j.compbiomed.2013.11.017. Epub 2013 Dec 1.

Abstract

About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model.

摘要

约 15%的基因组中的所有蛋白质在 N 端都含有一个信号肽 (SP) 序列,该序列将蛋白质靶向细胞内分泌途径。一旦蛋白质在细胞中被正确靶向,SP 就会被切割,释放出成熟的蛋白质。准确预测这些短氨基酸 SP 链的存在对于模拟膜蛋白的拓扑结构至关重要,因为 SP 序列由于疏水性氨基酸组成相似,可能与跨膜结构域混淆。本文提出了一种级联支持向量机 (SVM)-神经网络 (NN) 分类方法,用于 SP 区分和切割位点识别。所提出的方法利用双相分类方法,使用 SVM 作为主要分类器,将 SP 序列与非 SP 序列区分开来。该方法进一步利用神经网络来预测最合适的切割位点候选者。在第一阶段,SVM 分类利用疏水性倾向作为主要特征向量提取,使用对称滑动窗口氨基酸序列分析来区分 SP 和非 SP。在第二阶段,神经网络分类使用不对称滑动窗口序列分析来预测切割位点识别。使用具有 SP 和非 SP N 端的 Uni-Prot 非冗余真核和原核蛋白质数据集对所提出的 SVM-NN 方法进行了测试。计算机模拟结果表明,基于 SVM 的 Matthews 相关系数 (MCC) 测试,SP 和非 SP 区分的总体准确性为 0.90。对于 SP 切割位点预测,使用新型 SVM-NN 模型进行交叉验证测试的总体准确性为 91.5%。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验