SCMPSP：基于计分卡方法的光合蛋白预测与表征

SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.

作者信息

Vasylenko Tamara, Liou Yi-Fan, Chen Hong-An, Charoenkwan Phasit, Huang Hui-Ling, Ho Shinn-Ying

出版信息

BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2105-16-S1-S8. Epub 2015 Jan 21.

DOI:10.1186/1471-2105-16-S1-S8

PMID:25708243

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4331707/

Abstract

BACKGROUND

Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods.

RESULTS

A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains.

CONCLUSIONS

The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.

摘要

背景

光合蛋白（PSP）在结构和功能上存在很大差异，因为它们参与了叶绿体这一细胞器内发生的众多子过程。由于PSP序列和结构的高度多样性，很少有研究从序列中预测PSP。这项工作旨在通过建立PSP和非PSP序列数据集并开发预测方法来预测和表征PSP。

结果

使用了一种基于评分卡方法（SCMPSP）预测和表征PSP的新型生物信息学方法。首先，通过使用基因本体术语GO:0015979和来自SwissProt数据库的649个序列同一性<=25%的非PSP建立了一个由649个PSP组成的数据集。提出了几种基于支持向量机（SVM）、决策树J48、贝叶斯、BLAST和SCM的预测方法。使用二肽特征的SVM方法表现良好，测试准确率为72.31%。SCMPSP方法使用400种二肽作为PSP的估计倾向得分，测试准确率为71.54%，与SVM方法相当。进一步使用20种氨基酸的推导倾向得分来识别用于表征PSP的信息性物理化学性质。分析结果揭示了PSP的以下四个特征：1）PSP倾向于疏水侧链氨基酸；2）PSP由在膜环境中易于形成螺旋的氨基酸组成；3）PSP与水的相互作用较低；4）PSP更倾向于由具有电子反应性侧链的氨基酸组成。

结论

SCMPSP方法不仅估计了序列成为PSP的倾向，还发现了有助于进一步理解PSP的特征。本研究中使用的SCMPSP源代码和数据集可在http://iclab.life.nctu.edu.tw/SCMPSP/获取。

相似文献

SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.SCMPSP：基于计分卡方法的光合蛋白预测与表征

BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2105-16-S1-S8. Epub 2015 Jan 21.

SCMHBP: prediction and analysis of heme binding proteins using propensity scores of dipeptides.SCMHBP：利用二肽倾向得分预测和分析血红素结合蛋白

BMC Bioinformatics. 2014;15 Suppl 16(Suppl 16):S4. doi: 10.1186/1471-2105-15-S16-S4. Epub 2014 Dec 8.

SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides.SCMMTP：利用二肽倾向得分鉴定和表征膜转运蛋白

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S6. doi: 10.1186/1471-2164-16-S12-S6. Epub 2015 Dec 9.

Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition.利用新型评分卡方法和二肽组成预测和分析蛋白质溶解度。

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S3. doi: 10.1186/1471-2105-13-S17-S3. Epub 2012 Dec 13.

Propensity scores for prediction and characterization of bioluminescent proteins from sequences.用于从序列预测和表征生物发光蛋白的倾向得分。

PLoS One. 2014 May 14;9(5):e97158. doi: 10.1371/journal.pone.0097158. eCollection 2014.

SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs.SCMCRYS：使用基于 P 位氨基酸对倾向得分估计的集成评分卡方法预测蛋白质结晶。

PLoS One. 2013 Sep 3;8(9):e72368. doi: 10.1371/journal.pone.0072368. eCollection 2013.

Ranking Gene Ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes.对真核生物和原核生物中非经典分泌蛋白进行预测的基因本体论术语排序。

J Theor Biol. 2012 Nov 7;312:105-13. doi: 10.1016/j.jtbi.2012.07.027. Epub 2012 Aug 8.

PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method.PVPred-SCM：利用评分卡方法改进噬菌体衣壳蛋白的预测和分析。

Cells. 2020 Feb 3;9(2):353. doi: 10.3390/cells9020353.

iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides.iBitter-SCM：利用二肽倾向评分的评分卡方法鉴定和表征苦味肽。

Genomics. 2020 Jul;112(4):2813-2822. doi: 10.1016/j.ygeno.2020.03.019. Epub 2020 Mar 28.

Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning.利用机器学习从序列归因特征识别和表征质体型蛋白。

BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S7. doi: 10.1186/1471-2105-14-S14-S7. Epub 2013 Oct 9.

引用本文的文献

PSR-MAPMS: A new approach for the interpretable prediction of myelin autoantigenic peptides in multiple sclerosis using multi-source propensity scores.PSR-MAPMS：一种使用多源倾向评分对多发性硬化症中髓鞘自身抗原肽进行可解释预测的新方法。

Protein Sci. 2025 Aug;34(8):e70010. doi: 10.1002/pro.70010.

PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants.PredPSP：一种新型计算工具，用于发现植物中途径特异性的光合蛋白。

Plant Mol Biol. 2024 Sep 24;114(5):106. doi: 10.1007/s11103-024-01500-6.

Empirical comparison and analysis of machine learning-based approaches for druggable protein identification.基于机器学习的可成药蛋白识别方法的实证比较与分析

EXCLI J. 2023 Aug 29;22:915-927. doi: 10.17179/excli2023-6410. eCollection 2023.

TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus.特罗洛普：一种基于新型序列的堆叠方法，用于加速发现丙型肝炎病毒的线性 T 细胞表位。

PLoS One. 2023 Aug 25;18(8):e0290538. doi: 10.1371/journal.pone.0290538. eCollection 2023.

iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides.iAMAP-SCM：一种利用二肽估计倾向得分大规模鉴定抗疟肽的新型计算工具。

ACS Omega. 2022 Nov 2;7(45):41082-41095. doi: 10.1021/acsomega.2c04465. eCollection 2022 Nov 15.

SCMRSA: a New Approach for Identifying and Analyzing Anti-MRSA Peptides Using Estimated Propensity Scores of Dipeptides.SCMRSA：一种利用二肽估计倾向得分鉴定和分析抗耐甲氧西林金黄色葡萄球菌肽的新方法。

ACS Omega. 2022 Sep 1;7(36):32653-32664. doi: 10.1021/acsomega.2c04305. eCollection 2022 Sep 13.

Genome-Wide Profiling of Alternative Splicing and Gene Fusion during Rice Black-Streaked Dwarf Virus Stress in Maize ( L.).在玉米（L.）中受水稻黑条矮缩病毒胁迫时的可变剪接和基因融合的全基因组分析。

Genes (Basel). 2022 Mar 2;13(3):456. doi: 10.3390/genes13030456.

SCMTHP: A New Approach for Identifying and Characterizing of Tumor-Homing Peptides Using Estimated Propensity Scores of Amino Acids.SCMTHP：一种利用氨基酸估计倾向得分来鉴定和表征肿瘤归巢肽的新方法。

Pharmaceutics. 2022 Jan 4;14(1):122. doi: 10.3390/pharmaceutics14010122.

PhotoModPlus: A web server for photosynthetic protein prediction from genome neighborhood features.PhotoModPlus：一个基于基因组邻近特征预测光合蛋白的网络服务器。

PLoS One. 2021 Mar 17;16(3):e0248682. doi: 10.1371/journal.pone.0248682. eCollection 2021.

Photosynthetic protein classification using genome neighborhood-based machine learning feature.基于基因组邻域的机器学习特征进行光合作用蛋白分类。

Sci Rep. 2020 Apr 28;10(1):7108. doi: 10.1038/s41598-020-64053-w.

本文引用的文献

Propensity scores for prediction and characterization of bioluminescent proteins from sequences.用于从序列预测和表征生物发光蛋白的倾向得分。

PLoS One. 2014 May 14;9(5):e97158. doi: 10.1371/journal.pone.0097158. eCollection 2014.

Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning.利用机器学习从序列归因特征识别和表征质体型蛋白。

BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S7. doi: 10.1186/1471-2105-14-S14-S7. Epub 2013 Oct 9.

PLoS One. 2013 Sep 3;8(9):e72368. doi: 10.1371/journal.pone.0072368. eCollection 2013.

Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition.利用新型评分卡方法和二肽组成预测和分析蛋白质溶解度。

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S3. doi: 10.1186/1471-2105-13-S17-S3. Epub 2012 Dec 13.

Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins.利用光合作用蛋白评估功能保存与序列保存之间的关系。

Bioinformatics. 2012 Dec 15;28(24):3203-10. doi: 10.1093/bioinformatics/bts608. Epub 2012 Oct 18.

Antioxidative peptides: enzymatic production, in vitro and in vivo antioxidant activity and potential applications of milk-derived antioxidative peptides.抗氧化肽：酶法生产、体外和体内抗氧化活性以及乳源抗氧化肽的潜在应用。

Amino Acids. 2013 Mar;44(3):797-820. doi: 10.1007/s00726-012-1393-9. Epub 2012 Sep 12.

UCHIME improves sensitivity and speed of chimera detection.UCHIME 提高了嵌合体检测的灵敏度和速度。

Bioinformatics. 2011 Aug 15;27(16):2194-200. doi: 10.1093/bioinformatics/btr381. Epub 2011 Jun 23.

A new dawn for industrial photosynthesis.工业光合作用的新纪元。

Photosynth Res. 2011 Mar;107(3):269-77. doi: 10.1007/s11120-011-9631-7. Epub 2011 Feb 13.

Understanding oxidative stress and antioxidant functions to enhance photosynthesis.了解氧化应激和抗氧化功能以增强光合作用。

Plant Physiol. 2011 Jan;155(1):93-100. doi: 10.1104/pp.110.166181. Epub 2010 Nov 2.

Immobilization of porphyrin derivatives with a defined distance and orientation onto a gold electrode using synthetic light-harvesting α-helix hydrophobic polypeptides.使用合成的光捕获 α-螺旋疏水性多肽将卟啉衍生物固定在金电极上，使其具有确定的距离和取向。

Langmuir. 2010 Sep 21;26(18):14419-22. doi: 10.1021/la102869w.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SCMPSP：基于计分卡方法的光合蛋白预测与表征

SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.

作者信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献