基于多层融合多种信息源的改进序列预测无序区域。

Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources.

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.

出版信息

Bioinformatics. 2010 Sep 15;26(18):i489-96. doi: 10.1093/bioinformatics/btq373.

DOI:10.1093/bioinformatics/btq373

PMID:20823312

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2935446/

Abstract

MOTIVATION

Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed.

RESULTS

We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top-performing methods from the most recent CASP8 experiment. In spite of using training chains with <or=25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and provides competitive real-valued predictions. The MFDp's outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions.

AVAILABILITY

http://biomine.ece.ualberta.ca/MFDp.html.

摘要

动机

无序蛋白质在许多调节过程中起着至关重要的作用。它们的丰富度和普遍性，加上它们的注释相对较少，这促使人们研究开发能够从蛋白质序列中预测无序区域的计算模型。尽管这些方法的预测质量不断提高，但迫切需要新的和改进的预测器。

结果

我们提出了一种名为 MFDp（基于多层融合的无序预测器）的新方法，旨在改进现有的无序预测器。MFDp 是由 3 个专门用于预测短、长和通用无序区域的支持向量机构成的集成。它结合了三种互补的无序预测器，序列、序列图谱、预测的二级结构、溶剂可及性、骨架二面角扭转角、残基柔性和 B 因子。我们的方法利用了一组基于原始预测和聚合原始值的定制特征，识别各种类型的无序。在两个数据集上，我们在残基水平上与 8 种最新的无序预测器和最近的 CASP8 实验中的顶级方法进行了比较。尽管使用的训练链与测试序列的相似度<或=25%，但我们的方法基于 MCC 指数始终显著优于其他方法。MFDp 在二值无序分配方面优于现代无序预测器，并提供有竞争力的实值预测。MFDp 的输出在识别具有长无序区域的蛋白质方面也表现优于其他方法。

可用性

http://biomine.ece.ualberta.ca/MFDp.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f33/2935446/3e89da3e6f99/btq373f1.jpg

相似文献

Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources.

Bioinformatics. 2010 Sep 15;26(18):i489-96. doi: 10.1093/bioinformatics/btq373.

MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles.

Intrinsically Disord Proteins. 2013 Apr 1;1(1):e24428. doi: 10.4161/idp.24428. eCollection 2013 Jan-Dec.

In-silico prediction of disorder content using hybrid sequence representation.

BMC Bioinformatics. 2011 Jun 17;12:245. doi: 10.1186/1471-2105-12-245.

MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins.

Bioinformatics. 2012 Jun 15;28(12):i75-83. doi: 10.1093/bioinformatics/bts209.

DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences.

Bioinformatics. 2016 Jun 15;32(12):i341-i350. doi: 10.1093/bioinformatics/btw280.

Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

BMC Bioinformatics. 2008 Oct 10;9:430. doi: 10.1186/1471-2105-9-430.

Genome-scale prediction of proteins with long intrinsically disordered regions.

Proteins. 2014 Jan;82(1):145-58. doi: 10.1002/prot.24348. Epub 2013 Sep 17.

Length-dependent prediction of protein intrinsic disorder.

BMC Bioinformatics. 2006 Apr 17;7:208. doi: 10.1186/1471-2105-7-208.

Prediction of intrinsic disorder in proteins using MFDp2.

Methods Mol Biol. 2014;1137:147-62. doi: 10.1007/978-1-4939-0366-5_11.

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414.

引用本文的文献

FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion.

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf362.

IDP-EDL: enhancing intrinsically disordered protein prediction by combining protein language model and ensemble deep learning.

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf182.

A conserved motif in Henipavirus P/V/W proteins drives the fibrillation of the W protein from Hendra virus.

Protein Sci. 2025 Apr;34(4):e70085. doi: 10.1002/pro.70085.

Accurate Prediction of Protein-Binding Residues in Protein Sequences Using SCRIBER.

Methods Mol Biol. 2025;2867:247-260. doi: 10.1007/978-1-0716-4196-5_15.

Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn.

Methods Mol Biol. 2025;2867:201-218. doi: 10.1007/978-1-0716-4196-5_12.

Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences.

Methods Mol Biol. 2025;2870:1-19. doi: 10.1007/978-1-0716-4213-9_1.

Protein intrinsically disordered region prediction by combining neural architecture search and multi-objective genetic algorithm.

BMC Biol. 2023 Sep 7;21(1):188. doi: 10.1186/s12915-023-01672-5.

Computational prediction of disordered binding regions.

Comput Struct Biotechnol J. 2023 Feb 10;21:1487-1497. doi: 10.1016/j.csbj.2023.02.018. eCollection 2023.

Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions.

Genes (Basel). 2023 Feb 8;14(2):432. doi: 10.3390/genes14020432.

Prediction of protein-protein interaction sites in intrinsically disordered proteins.

Front Mol Biosci. 2022 Sep 30;9:985022. doi: 10.3389/fmolb.2022.985022. eCollection 2022.

本文引用的文献

Structural genomics target selection for the New York consortium on membrane protein structure.

J Struct Funct Genomics. 2009 Dec;10(4):255-68. doi: 10.1007/s10969-009-9071-1. Epub 2009 Oct 27.

Assessment of disorder predictions in CASP8.

Proteins. 2009;77 Suppl 9:210-6. doi: 10.1002/prot.22586.

The role of intrinsically unstructured proteins in neurodegenerative diseases.

PLoS One. 2009;4(5):e5566. doi: 10.1371/journal.pone.0005566. Epub 2009 May 15.

On the relation between residue flexibility and local solvent accessibility in proteins.

Proteins. 2009 Aug 15;76(3):617-36. doi: 10.1002/prot.22375.

Close encounters of the third kind: disordered domains and the interactions of proteins.

Bioessays. 2009 Mar;31(3):328-35. doi: 10.1002/bies.200800151.

Improved disorder prediction by combination of orthogonal approaches.

PLoS One. 2009;4(2):e4433. doi: 10.1371/journal.pone.0004433. Epub 2009 Feb 11.

Large-scale prediction of long disordered regions in proteins using random forests.

BMC Bioinformatics. 2009 Jan 7;10:8. doi: 10.1186/1471-2105-10-8.

The unfoldomics decade: an update on intrinsically disordered proteins.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-9-S2-S1.

Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network.

Proteins. 2009 Mar;74(4):847-56. doi: 10.1002/prot.22193.

Intrinsic disorder prediction from the analysis of multiple protein fold recognition models.

Bioinformatics. 2008 Aug 15;24(16):1798-804. doi: 10.1093/bioinformatics/btn326. Epub 2008 Jun 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于多层融合多种信息源的改进序列预测无序区域。

Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources.

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.