• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于支持向量机的方法来区分细菌蛋白和真核植物蛋白。

A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins.

机构信息

Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078, USA.

出版信息

BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-13-S15-S9. Epub 2012 Sep 11.

DOI:10.1186/1471-2105-13-S15-S9
PMID:23046503
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3439722/
Abstract

BACKGROUND

Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).

RESULT

The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.

CONCLUSION

The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.

摘要

背景

在导致农业减产减质的植物病原菌中,变形菌门的细菌最为突出。为了减轻这些损失,有必要在早期发现感染。新一代核酸测序和质谱技术的发展为通过植物大分子序列筛选植物开辟了道路。这种方法需要能够识别未知 DNA 或肽片段的生物起源。有很多方法可以解决这个问题,但没有一种方法成为最佳方案。在这里,我们尝试使用机器学习算法系统地确定肽的生物起源。我们实现的算法是支持向量机(SVM)。

结果

发现变形菌蛋白的氨基酸组成与植物蛋白的氨基酸组成不同。我们开发了一种基于氨基酸和二肽组成的 SVM 模型,以区分变形菌蛋白和植物蛋白。基于氨基酸组成(AAC)的 SVM 模型的准确性为 92.44%,马修斯相关系数(MCC)为 0.85,而基于二肽组成(DC)的 SVM 模型的最大准确性为 94.67%,MCC 为 0.89。我们还开发了基于混合方法(AAC 和 DC)的 SVM 模型,其最大准确性为 94.86%,MCC 为 0.90。这些模型在未见或未训练的数据集中进行了测试,以评估其有效性。

结论

结果表明,基于 AAC 和 DC 混合方法的 SVM 可用于区分植物和变形菌蛋白序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4b8/3439722/190f7dc83177/1471-2105-13-S15-S9-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4b8/3439722/fd699f2a00a4/1471-2105-13-S15-S9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4b8/3439722/e56f8ccc6d3b/1471-2105-13-S15-S9-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4b8/3439722/190f7dc83177/1471-2105-13-S15-S9-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4b8/3439722/fd699f2a00a4/1471-2105-13-S15-S9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4b8/3439722/e56f8ccc6d3b/1471-2105-13-S15-S9-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4b8/3439722/190f7dc83177/1471-2105-13-S15-S9-5.jpg

相似文献

1
A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins.基于支持向量机的方法来区分细菌蛋白和真核植物蛋白。
BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-13-S15-S9. Epub 2012 Sep 11.
2
Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning.利用机器学习从序列归因特征识别和表征质体型蛋白。
BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S7. doi: 10.1186/1471-2105-14-S14-S7. Epub 2013 Oct 9.
3
Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators.基于支持向量机(SVM)的多类预测及纤溶酶原激活剂的基本统计分析
BMC Res Notes. 2014 Jan 27;7:63. doi: 10.1186/1756-0500-7-63.
4
Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search.基于支持向量机的方法,利用氨基酸组成、顺序及相似性搜索对人类蛋白质进行亚细胞定位
J Biol Chem. 2005 Apr 15;280(15):14427-32. doi: 10.1074/jbc.M411789200. Epub 2005 Jan 12.
5
Signal peptide discrimination and cleavage site identification using SVM and NN.使用 SVM 和 NN 进行信号肽识别和切割位点鉴定。
Comput Biol Med. 2014 Feb;45:98-110. doi: 10.1016/j.compbiomed.2013.11.017. Epub 2013 Dec 1.
6
SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.基于支持向量机的 RNA 结合蛋白结合残基和进化信息预测。
J Mol Recognit. 2011 Mar-Apr;24(2):303-13. doi: 10.1002/jmr.1061.
7
A novel fractal approach for predicting G-protein-coupled receptors and their subfamilies with support vector machines.一种结合支持向量机的用于预测G蛋白偶联受体及其亚家族的新型分形方法。
Biomed Mater Eng. 2015;26 Suppl 1:S1829-36. doi: 10.3233/BME-151485.
8
Prediction of anti-inflammatory proteins/peptides: an insilico approach.抗炎蛋白/肽的预测:一种计算机模拟方法。
J Transl Med. 2017 Jan 6;15(1):7. doi: 10.1186/s12967-016-1103-6.
9
Antioxidant Proteins' Identification Based on Support Vector Machine.基于支持向量机的抗氧化蛋白鉴定。
Comb Chem High Throughput Screen. 2020;23(4):319-325. doi: 10.2174/1386207323666200306125538.
10
UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.UbiSite:结合具有底物基序的两层机器学习方法来预测赖氨酸上的泛素结合位点。
BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z.

引用本文的文献

1
Protein sub-nuclear localization prediction using SVM and Pfam domain information.利用支持向量机和Pfam结构域信息进行蛋白质亚核定位预测。
PLoS One. 2014 Jun 4;9(6):e98345. doi: 10.1371/journal.pone.0098345. eCollection 2014.
2
Characterization of TtALV2, an essential charged repeat motif protein of the Tetrahymena thermophila membrane skeleton.嗜热四膜虫膜骨架的必需带电重复基序蛋白TtALV2的特性分析
Eukaryot Cell. 2013 Jun;12(6):932-40. doi: 10.1128/EC.00050-13. Epub 2013 Apr 19.
3
Proceedings of the 2012 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.

本文引用的文献

1
Using support vector machines with multiple indices of diffusion for automated classification of mild cognitive impairment.使用具有多个扩散指标的支持向量机进行轻度认知障碍的自动分类。
PLoS One. 2012;7(2):e32441. doi: 10.1371/journal.pone.0032441. Epub 2012 Feb 23.
2
Prediction of intracerebral hemorrhage following thrombolytic therapy for acute ischemic stroke using multiple artificial neural networks.使用多个神经网络预测急性缺血性卒中溶栓治疗后的脑出血
Neurol Res. 2012 Mar;34(2):120-8. doi: 10.1179/1743132811Y.0000000067. Epub 2012 Jan 13.
3
Support vector regression and artificial neural network models for stability indicating analysis of mebeverine hydrochloride and sulpiride mixtures in pharmaceutical preparation: a comparative study.
2012年中南计算生物学与生物信息学学会(MCBIOS)会议论文集。引言。
BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S1. doi: 10.1186/1471-2105-13-S15-S1. Epub 2012 Sep 11.
支持向量回归和人工神经网络模型在盐酸美贝维林和舒必利药物制剂中稳定性指示分析中的应用:比较研究。
Spectrochim Acta A Mol Biomol Spectrosc. 2012 Feb;86:515-26. doi: 10.1016/j.saa.2011.11.003. Epub 2011 Nov 20.
4
Support vector machine multiparametric MRI identification of pseudoprogression from tumor recurrence in patients with resected glioblastoma.支持向量机多参数 MRI 鉴别切除后的胶质母细胞瘤患者肿瘤复发生物假进展。
J Magn Reson Imaging. 2011 Feb;33(2):296-305. doi: 10.1002/jmri.22432.
5
Identification of conformational B-cell Epitopes in an antigen from its primary sequence.从抗原的一级序列中鉴定其构象性B细胞表位。
Immunome Res. 2010 Oct 20;6:6. doi: 10.1186/1745-7580-6-6.
6
Mycophenolic acid area under the curve correlates with disease activity in lupus patients treated with mycophenolate mofetil.霉酚酸曲线下面积与接受霉酚酸酯治疗的狼疮患者的疾病活动度相关。
Arthritis Rheum. 2010 Jul;62(7):2047-54. doi: 10.1002/art.27495.
7
Bagging optimal ROC curve method for predictive genetic tests, with an application for rheumatoid arthritis.用于预测性基因检测的装袋优化ROC曲线方法及其在类风湿关节炎中的应用
J Biopharm Stat. 2010 Mar;20(2):401-14. doi: 10.1080/10543400903572811.
8
Cardiac sound murmurs classification with autoregressive spectral analysis and multi-support vector machine technique.基于自回归谱分析和多支持向量机技术的心音杂音分类。
Comput Biol Med. 2010 Jan;40(1):8-20. doi: 10.1016/j.compbiomed.2009.10.003. Epub 2009 Nov 18.
9
Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile.利用氨基酸组成拆分和 PSSM 图谱预测疟原虫的线粒体蛋白。
Amino Acids. 2010 Jun;39(1):101-10. doi: 10.1007/s00726-009-0381-1. Epub 2009 Nov 12.
10
ROC, LROC, FROC, AFROC: an alphabet soup.ROC、LROC、FROC、AFROC:一堆字母组合。
J Am Coll Radiol. 2009 Sep;6(9):652-5. doi: 10.1016/j.jacr.2009.06.001.