使用随机森林算法显著提高了磺酪氨酸位点的预测准确性。

Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy.

机构信息

School of Biosciences, University of Exeter, Exeter EX4 5DE, UK.

出版信息

BMC Bioinformatics. 2009 Oct 29;10:361. doi: 10.1186/1471-2105-10-361.

DOI:10.1186/1471-2105-10-361

PMID:19874585

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2777180/

Abstract

BACKGROUND

Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins.

RESULTS

A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at http://ecsb.ex.ac.uk/sulfotyrosine for public use.

CONCLUSION

The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification.

摘要

背景

酪氨酸硫酸化是最重要的翻译后修饰之一。由于其与各种疾病发展的相关性，酪氨酸硫酸化已成为药物设计的目标。为了促进高效的药物设计，准确预测硫酸酪氨酸位点是理想的。七年前发表的一个预测器在声称的预测精度为 98%方面非常成功。然而，在预测一些新测序蛋白质中的硫酸酪氨酸位点时，它的灵敏度特别低。

结果

在仔细评估了七种机器学习算法之后，我们使用随机森林算法开发了一种新的预测硫酸酪氨酸位点的方法。肽由酪氨酸位点两侧连续的残基形成。然后，它们使用氨基酸疏水性尺度进行编码。与使用相同盲数据的先前预测器相比，这种新方法将灵敏度提高了 22%，特异性提高了 3%，总预测精度提高了 10%。同时，阴性和阳性预测值都提高了 9%。此外，随机森林模型具有很好的功能，可以对酪氨酸位点周围的残基进行排序，从而为进一步研究酪氨酸硫酸化机制提供更多信息。一个网络工具已在 http://ecsb.ex.ac.uk/sulfotyrosine 上实现，供公众使用。

结论

与隐马尔可夫模型、支持向量机、人工神经网络等相比，随机森林算法能够为预测硫酸酪氨酸位点提供更好的模型。成功表明，随机森林算法结合氨基酸疏水性尺度编码可以成为肽分类的一个很好的候选者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3955/2777180/5d2a5b30c4a2/1471-2105-10-361-1.jpg

相似文献

Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy.

BMC Bioinformatics. 2009 Oct 29;10:361. doi: 10.1186/1471-2105-10-361.

PredSulSite: prediction of protein tyrosine sulfation sites with multiple features and analysis.

Anal Biochem. 2012 Sep 1;428(1):16-23. doi: 10.1016/j.ab.2012.06.003. Epub 2012 Jun 9.

Prediction of protein binding sites in protein structures using hidden Markov support vector machine.

BMC Bioinformatics. 2009 Nov 20;10:381. doi: 10.1186/1471-2105-10-381.

Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.

Proteins. 2011 Apr;79(4):1230-9. doi: 10.1002/prot.22958. Epub 2011 Jan 25.

Incorporating support vector machine for identifying protein tyrosine sulfation sites.

J Comput Chem. 2009 Nov 30;30(15):2526-37. doi: 10.1002/jcc.21258.

Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique.

J Theor Biol. 2015 Jun 7;374:60-5. doi: 10.1016/j.jtbi.2015.03.029. Epub 2015 Apr 2.

Computational Prediction and Analysis for Tyrosine Post-Translational Modifications via Elastic Net.

J Chem Inf Model. 2018 Jun 25;58(6):1272-1281. doi: 10.1021/acs.jcim.7b00688. Epub 2018 May 18.

Prediction of tyrosine sulfation with mRMR feature selection and analysis.

J Proteome Res. 2010 Dec 3;9(12):6490-7. doi: 10.1021/pr1007152. Epub 2010 Nov 11.

PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features.

Int J Mol Sci. 2021 Mar 8;22(5):2704. doi: 10.3390/ijms22052704.

Predicting RNA-binding sites of proteins using support vector machines and evolutionary information.

BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S6. doi: 10.1186/1471-2105-9-S12-S6.

引用本文的文献

Identification of tyrosine sulfation in the variable region of a bispecific antibody and its effect on stability and biological activity.

MAbs. 2023 Jan-Dec;15(1):2259289. doi: 10.1080/19420862.2023.2259289. Epub 2023 Sep 24.

A potential antibody repertoire diversification mechanism through tyrosine sulfation for biotherapeutics engineering and production.

Front Immunol. 2022 Dec 8;13:1072702. doi: 10.3389/fimmu.2022.1072702. eCollection 2022.

In silico prediction of post-translational modifications in therapeutic antibodies.

MAbs. 2022 Jan-Dec;14(1):2023938. doi: 10.1080/19420862.2021.2023938.

Characterization and prediction of positional 4-hydroxyproline and sulfotyrosine, two post-translational modifications that can occur at substantial levels in CHO cells-expressed biotherapeutics.

MAbs. 2019 Oct;11(7):1219-1232. doi: 10.1080/19420862.2019.1635865. Epub 2019 Jul 24.

A Novel Phosphorylation Site-Kinase Network-Based Method for the Accurate Prediction of Kinase-Substrate Relationships.

Biomed Res Int. 2017;2017:1826496. doi: 10.1155/2017/1826496. Epub 2017 Oct 12.

Small changes huge impact: the role of protein posttranslational modifications in cellular homeostasis and disease.

J Amino Acids. 2011;2011:207691. doi: 10.4061/2011/207691. Epub 2011 Jul 21.

本文引用的文献

Clinical discriminations and neuropsychological tests: An appeal to bayes' theorem.

Clin Neuropsychol. 1993 Apr;7(2):224-233. doi: 10.1080/13854049308401527.

Prediction of interactions between HIV-1 and human proteins by information integration.

Pac Symp Biocomput. 2009:516-27.

Identification of differential gene expression for microarray data using recursive random forest.

Chin Med J (Engl). 2008 Dec 20;121(24):2492-6.

Diagnosis of ulcerative colitis before onset of inflammation by multivariate modeling of genome-wide gene expression data.

Inflamm Bowel Dis. 2009 Jul;15(7):1032-8. doi: 10.1002/ibd.20879.

Peptide bioinformatics: peptide classification using peptide machines.

Methods Mol Biol. 2008;458:159-83. doi: 10.1007/978-1-60327-101-1_9.

Targeting heparan sulfate proteoglycans in breast cancer treatment.

Recent Pat Anticancer Drug Discov. 2008 Nov;3(3):151-8. doi: 10.2174/157489208786242278.

Immunohistochemical level of unsulfated chondroitin disaccharides in the cancer stroma is an independent predictor of prostate cancer relapse.

Cancer Epidemiol Biomarkers Prev. 2008 Sep;17(9):2488-97. doi: 10.1158/1055-9965.EPI-08-0204.

Sulfotransferase 2B1b in human breast: differences in subcellular localization in African American and Caucasian women.

J Steroid Biochem Mol Biol. 2008 Sep;111(3-5):171-7. doi: 10.1016/j.jsbmb.2008.05.006. Epub 2008 Jun 8.

Increased expression of non-sulfated chondroitin correlates with adverse clinicopathological parameters in prostate cancer.

Mod Pathol. 2008 Jul;21(7):893-901. doi: 10.1038/modpathol.2008.70. Epub 2008 May 16.

On the sulfation and methylation of catecholestrogens in human mammary epithelial cells and breast cancer cells.

Biol Pharm Bull. 2008 Apr;31(4):769-73. doi: 10.1248/bpb.31.769.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用随机森林算法显著提高了磺酪氨酸位点的预测准确性。

Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献