一种用于预测全α膜蛋白的集成机器学习方法。

An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins.

作者信息

Martelli Pier Luigi, Fariselli Piero, Casadio Rita

机构信息

Laboratory of Biocomputing, CIRB/Department of Biology, University of Bologna, via Irnerio 42, 40126 Bologna, Italy.

出版信息

Bioinformatics. 2003;19 Suppl 1:i205-11. doi: 10.1093/bioinformatics/btg1027.

DOI:10.1093/bioinformatics/btg1027

PMID:12855459

Abstract

MOTIVATION

All-alpha membrane proteins constitute a functionally relevant subset of the whole proteome. Their content ranges from about 10 to 30% of the cell proteins, based on sequence comparison and specific predictive methods. Due to the paucity of membrane proteins solved with atomic resolution, the training/testing sets of predictive methods for protein topography and topology routinely include very few well-solved structures mixed with a hundred proteins known with low resolution. Moreover, available predictors fail in predicting recently crystallised membrane proteins (Chen et al., 2002). Presently the number of well-solved membrane proteins comprises some 59 chains of low sequence homology. It is therefore possible to train/test predictors only with the set of proteins known with atomic resolution and evaluate more thoroughly the performance of different methods.

RESULTS

We implement a cascade-neural network (NN), two different hidden Markov models (HMM), and their ensemble (ENSEMBLE) as a new method. We train and test in cross validation the three methods and ENSEMBLE on the 59 well resolved membrane proteins. ENSEMBLE scores with a per-protein accuracy of 90% for topography and 71% for topology, outperforming the best single method of 7 and 5 percentage points, respectively. When tested on a low resolution set of 151 proteins, with no homology with the 59 proteins, the per-protein accuracy of ENSEMBLE is 76% for topography and 68% for topology. Our results also indicate that the performance of ENSEMBLE is higher than that of the best predictors presently available on the Web.

摘要

动机

全α膜蛋白构成了整个蛋白质组中功能相关的一个子集。根据序列比较和特定预测方法，它们在细胞蛋白中的含量范围约为10%至30%。由于以原子分辨率解析的膜蛋白数量稀少，蛋白质拓扑结构和拓扑预测方法的训练/测试集通常只包含极少数解析良好的结构，与一百个低分辨率已知的蛋白质混合在一起。此外，现有的预测器在预测最近结晶的膜蛋白时失败（Chen等人，2002年）。目前，解析良好的膜蛋白数量包括约59条低序列同源性的链。因此，有可能仅使用原子分辨率已知的蛋白质集来训练/测试预测器，并更全面地评估不同方法的性能。

结果

我们实现了一种级联神经网络（NN）、两种不同的隐马尔可夫模型（HMM）及其集成（ENSEMBLE）作为一种新方法。我们在交叉验证中对这三种方法和ENSEMBLE在59个解析良好的膜蛋白上进行训练和测试。ENSEMBLE在拓扑结构预测方面的单蛋白准确率为90%，在拓扑预测方面为71%，分别比最佳单一方法高出7和5个百分点。当在与59个蛋白无同源性的151个低分辨率蛋白集上进行测试时，ENSEMBLE在拓扑结构预测方面的单蛋白准确率为76%，在拓扑预测方面为68%。我们的结果还表明，ENSEMBLE的性能高于目前网络上可用的最佳预测器。

相似文献

An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins.一种用于预测全α膜蛋白的集成机器学习方法。

Bioinformatics. 2003;19 Suppl 1:i205-11. doi: 10.1093/bioinformatics/btg1027.

A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins.一种基于序列概况的隐马尔可夫模型，用于预测和鉴别β桶状膜蛋白。

Bioinformatics. 2002;18 Suppl 1:S46-53. doi: 10.1093/bioinformatics/18.suppl_1.s46.

Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method.β-桶状外膜蛋白拓扑结构预测方法的评估及一种共识预测方法

BMC Bioinformatics. 2005 Jan 12;6:7. doi: 10.1186/1471-2105-6-7.

MaxSubSeq: an algorithm for segment-length optimization. The case study of the transmembrane spanning segments.最大子序列：一种用于片段长度优化的算法。跨膜跨段的案例研究。

Bioinformatics. 2003 Mar 1;19(4):500-5. doi: 10.1093/bioinformatics/btg023.

Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins.用于将先验拓扑信息纳入隐马尔可夫模型的算法：在跨膜蛋白中的应用。

BMC Bioinformatics. 2006 Apr 5;7:189. doi: 10.1186/1471-2105-7-189.

An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes.一种用于跨膜蛋白检测和拓扑结构预测的改进型隐马尔可夫模型及其在完整基因组中的应用。

Bioinformatics. 2005 May 1;21(9):1853-8. doi: 10.1093/bioinformatics/bti303. Epub 2005 Feb 2.

Enhanced recognition of protein transmembrane domains with prediction-based structural profiles.利用基于预测的结构轮廓增强对蛋白质跨膜结构域的识别。

Bioinformatics. 2006 Feb 1;22(3):303-9. doi: 10.1093/bioinformatics/bti784. Epub 2005 Nov 17.

Learning to translate sequence and structure to function: identifying DNA binding and membrane binding proteins.学习将序列和结构转化为功能：识别DNA结合蛋白和膜结合蛋白。

Ann Biomed Eng. 2007 Jun;35(6):1043-52. doi: 10.1007/s10439-007-9312-z. Epub 2007 Apr 13.

OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar.章鱼座：通过基于双轨人工神经网络的偏好分数和扩展拓扑语法改进拓扑预测

Bioinformatics. 2008 Aug 1;24(15):1662-8. doi: 10.1093/bioinformatics/btn221. Epub 2008 May 12.

ZPRED: predicting the distance to the membrane center for residues in alpha-helical membrane proteins.ZPRED：预测α-螺旋膜蛋白中残基到膜中心的距离

Bioinformatics. 2006 Jul 15;22(14):e191-6. doi: 10.1093/bioinformatics/btl206.

引用本文的文献

Oral Metformin Inhibits Choroidal Neovascularization by Modulating the Gut-Retina Axis.口服二甲双胍通过调节肠-视网膜轴抑制脉络膜新生血管。

Invest Ophthalmol Vis Sci. 2023 Dec 1;64(15):21. doi: 10.1167/iovs.64.15.21.

Improving the topology prediction of α-helical transmembrane proteins with deep transfer learning.利用深度迁移学习改进α-螺旋跨膜蛋白的拓扑结构预测

Comput Struct Biotechnol J. 2022 Apr 20;20:1993-2000. doi: 10.1016/j.csbj.2022.04.024. eCollection 2022.

Machine Learning Does Not Improve Humeral Torsion Prediction Compared to Regression in Baseball Pitchers.与回归分析相比，机器学习在预测棒球投手肱骨扭转方面并无优势。

Int J Sports Phys Ther. 2022 Apr 1;17(3):390-399. doi: 10.26603/001c.32380. eCollection 2022.

Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences.用于识别分选信号以及根据氨基酸序列预测蛋白质亚细胞定位的工具。

Front Genet. 2020 Nov 25;11:607812. doi: 10.3389/fgene.2020.607812. eCollection 2020.

Benchmarking subcellular localization and variant tolerance predictors on membrane proteins.对膜蛋白的亚细胞定位和变体耐受性预测器进行基准测试。

BMC Genomics. 2019 Jul 16;20(Suppl 8):547. doi: 10.1186/s12864-019-5865-0.

BUSCA: an integrative web server to predict subcellular localization of proteins.BUSCA：一个综合性的网络服务器，用于预测蛋白质的亚细胞定位。

Nucleic Acids Res. 2018 Jul 2;46(W1):W459-W466. doi: 10.1093/nar/gky320.

Discrimination of Native-like States of Membrane Proteins with Implicit Membrane-based Scoring Functions.基于隐式膜评分函数的膜蛋白类天然状态判别

J Chem Theory Comput. 2017 Jun 13;13(6):3049-3059. doi: 10.1021/acs.jctc.7b00254. Epub 2017 May 11.

Computational Approaches for Revealing the Structure of Membrane Transporters: Case Study on Bilitranslocase.揭示膜转运蛋白结构的计算方法：以胆红素转运蛋白为例

Comput Struct Biotechnol J. 2017 Jan 31;15:232-242. doi: 10.1016/j.csbj.2017.01.008. eCollection 2017.

The Regulatory Domain of Squalene Monooxygenase Contains a Re-entrant Loop and Senses Cholesterol via a Conformational Change.角鲨烯单加氧酶的调节结构域包含一个折返环，并通过构象变化感知胆固醇。

J Biol Chem. 2015 Nov 13;290(46):27533-44. doi: 10.1074/jbc.M115.675181. Epub 2015 Oct 3.

Research resource: EPSLiM: ensemble predictor for short linear motifs in nuclear hormone receptors.研究资源：EPSLiM：核激素受体中短线性基序的集成预测器。

Mol Endocrinol. 2014 May;28(5):768-77. doi: 10.1210/me.2014-1006. Epub 2014 Mar 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于预测全α膜蛋白的集成机器学习方法。

An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献