利用支持向量机从序列信息中鉴定功能多样的脂联素蛋白。

Identification of functionally diverse lipocalin proteins from sequence information using support vector machine.

机构信息

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore.

出版信息

Amino Acids. 2010 Aug;39(3):777-83. doi: 10.1007/s00726-010-0520-8. Epub 2010 Feb 26.

PMID:20186553

Abstract

Lipocalins are functionally diverse proteins that are composed of 120-180 amino acid residues. Members of this family have several important biological functions including ligand transport, cryptic coloration, sensory transduction, endonuclease activity, stress response activity in plants, odorant binding, prostaglandin biosynthesis, cellular homeostasis regulation, immunity, immunotherapy and so on. Identification of lipocalins from protein sequence is more challenging due to the poor sequence identity which often falls below the twilight zone. So far, no specific method has been reported to identify lipocalins from primary sequence. In this paper, we report a support vector machine (SVM) approach to predict lipocalins from protein sequence using sequence-derived properties. LipoPred was trained using a dataset consisting of 325 lipocalin proteins and 325 non-lipocalin proteins, and evaluated by an independent set of 140 lipocalin proteins and 21,447 non-lipocalin proteins. LipoPred achieved 88.61% accuracy with 89.26% sensitivity, 85.27% specificity and 0.74 Matthew's correlation coefficient (MCC). When applied on the test dataset, LipoPred achieved 84.25% accuracy with 88.57% sensitivity, 84.22% specificity and MCC of 0.16. LipoPred achieved better performance rate when compared with PSI-BLAST, HMM and SVM-Prot methods. Out of 218 lipocalins, LipoPred correctly predicted 194 proteins including 39 lipocalins that are non-homologous to any protein in the SWISSPROT database. This result shows that LipoPred is potentially useful for predicting the lipocalin proteins that have no sequence homologs in the sequence databases. Further, successful prediction of nine hypothetical lipocalin proteins and five new members of lipocalin family prove that LipoPred can be efficiently used to identify and annotate the new lipocalin proteins from sequence databases. The LipoPred software and dataset are available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/lipopred.htm.

摘要

脂质运载蛋白是功能多样的蛋白质，由 120-180 个氨基酸残基组成。该家族的成员具有多种重要的生物学功能，包括配体运输、隐藏颜色、感觉转导、内切核酸酶活性、植物应激反应活性、气味结合、前列腺素生物合成、细胞内稳态调节、免疫、免疫治疗等。由于序列同一性较差，通常低于黄昏带，因此从蛋白质序列中鉴定脂质运载蛋白更具挑战性。到目前为止，还没有报道从原始序列中鉴定脂质运载蛋白的特定方法。在本文中，我们报告了一种支持向量机（SVM）方法，该方法使用序列衍生特性从蛋白质序列中预测脂质运载蛋白。LipoPred 使用由 325 个脂质运载蛋白和 325 个非脂质运载蛋白组成的数据集进行训练，并使用独立的 140 个脂质运载蛋白和 21447 个非脂质运载蛋白数据集进行评估。LipoPred 在独立数据集上的准确率为 88.61%，敏感性为 89.26%，特异性为 85.27%，马修相关系数（MCC）为 0.74。当应用于测试数据集时，LipoPred 的准确率为 84.25%，敏感性为 88.57%，特异性为 84.22%，MCC 为 0.16。与 PSI-BLAST、HMM 和 SVM-Prot 方法相比，LipoPred 的性能更好。在 218 个脂质运载蛋白中，LipoPred 正确预测了 194 个蛋白质，包括 39 个与 SWISSPROT 数据库中任何蛋白质都没有同源性的脂质运载蛋白。这一结果表明，LipoPred 可能有助于预测在序列数据库中没有序列同源物的脂质运载蛋白。此外，对 9 个假设的脂质运载蛋白和 5 个脂质运载蛋白家族的新成员的成功预测证明，LipoPred 可以有效地用于从序列数据库中识别和注释新的脂质运载蛋白。LipoPred 软件和数据集可在 http://www3.ntu.edu.sg/home/EPNSugan/index_files/lipopred.htm 上获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用支持向量机从序列信息中鉴定功能多样的脂联素蛋白。

Identification of functionally diverse lipocalin proteins from sequence information using support vector machine.

机构信息

出版信息

相似文献

引用本文的文献

利用支持向量机从序列信息中鉴定功能多样的脂联素蛋白。

Identification of functionally diverse lipocalin proteins from sequence information using support vector machine.

机构信息

出版信息

相似文献

引用本文的文献