DFLpred：蛋白质序列中无序柔性连接区的高通量预测

DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences.

作者信息

Meng Fanchi, Kurgan Lukasz

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada.

Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, U.S.A.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i341-i350. doi: 10.1093/bioinformatics/btw280.

DOI:10.1093/bioinformatics/btw280

PMID:27307636

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908364/

Abstract

MOTIVATION

Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder.

RESULTS

We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues.

AVAILABILITY AND IMPLEMENTATION

http://biomine.ece.ualberta.ca/DFLpred/

CONTACT

lkurgan@vcu.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

无序柔性连接子（DFLs）是在多结构域蛋白质中或结构域内的结构化成分之间充当柔性连接子/间隔区的无序区域。它们与柔性连接子/残基不同，因为它们是无序的且更长。实验注释的DFLs的可用性为从蛋白质序列构建这些区域的高通量计算预测器提供了机会。迄今为止，尚无直接预测DFLs的计算方法，只能通过用无序预测过滤预测的柔性残基来间接找到它们。

结果

我们构思、开发并实证评估了首个基于序列的DFLs预测器DFLpred。该方法输出输入序列中每个残基形成DFLs的倾向。DFLpred使用一小套经验选择的特征来量化形成某些二级结构、无序区域和结构化区域的倾向，这些特征由一个快速线性模型处理。我们的高通量预测器可用于全蛋白质组规模；在单个CPU上预测整个蛋白质组需要不到1小时。在具有低序列同一性蛋白质的独立测试数据集上进行评估时，它在接收器操作特征曲线下的面积为0.715，优于现有的替代方法，包括用于预测柔性连接子、柔性残基、内在无序残基以及这些方法的各种组合的方法。对完整人类蛋白质组的预测表明，约10%的蛋白质含有超过30%的DFL残基。我们还估计约有6000个DFL区域长度≥30个连续残基。

可用性和实现方式

http://biomine.ece.ualberta.ca/DFLpred/

联系方式

lkurgan@vcu.edu

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94d2/4908364/79b7e436d922/btw280f1p.jpg

相似文献

DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences.

Bioinformatics. 2016 Jun 15;32(12):i341-i350. doi: 10.1093/bioinformatics/btw280.

APOD: accurate sequence-based predictor of disordered flexible linkers.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i754-i761. doi: 10.1093/bioinformatics/btaa808.

High-throughput prediction of disordered moonlighting regions in protein sequences.

Proteins. 2018 Oct;86(10):1097-1110. doi: 10.1002/prot.25590. Epub 2018 Sep 23.

TransDFL: Identification of Disordered Flexible Linkers in Proteins by Transfer Learning.

Genomics Proteomics Bioinformatics. 2023 Apr;21(2):359-369. doi: 10.1016/j.gpb.2022.10.004. Epub 2022 Oct 19.

Genome-scale prediction of proteins with long intrinsically disordered regions.

Proteins. 2014 Jan;82(1):145-58. doi: 10.1002/prot.24348. Epub 2013 Sep 17.

Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus.

J Biomol Struct Dyn. 2014;32(3):448-64. doi: 10.1080/07391102.2013.775969. Epub 2013 Mar 27.

Intrinsic disorder in the Protein Data Bank.

J Biomol Struct Dyn. 2007 Feb;24(4):325-42. doi: 10.1080/07391102.2007.10507123.

Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources.

Bioinformatics. 2010 Sep 15;26(18):i489-96. doi: 10.1093/bioinformatics/btq373.

Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins.

Bioinformatics. 2020 Sep 15;36(18):4729-4738. doi: 10.1093/bioinformatics/btaa573.

In-silico prediction of disorder content using hybrid sequence representation.

BMC Bioinformatics. 2011 Jun 17;12:245. doi: 10.1186/1471-2105-12-245.

引用本文的文献

Advancements in one-dimensional protein structure prediction using machine learning and deep learning.

Comput Struct Biotechnol J. 2025 Apr 3;27:1416-1430. doi: 10.1016/j.csbj.2025.04.005. eCollection 2025.

Accurate Prediction of Protein-Binding Residues in Protein Sequences Using SCRIBER.

Methods Mol Biol. 2025;2867:247-260. doi: 10.1007/978-1-0716-4196-5_15.

Prediction of Disordered Linkers Using APOD.

Methods Mol Biol. 2025;2867:219-231. doi: 10.1007/978-1-0716-4196-5_13.

Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn.

Methods Mol Biol. 2025;2867:201-218. doi: 10.1007/978-1-0716-4196-5_12.

DescribePROT Database of Residue-Level Protein Structure and Function Annotations.

Methods Mol Biol. 2025;2867:169-184. doi: 10.1007/978-1-0716-4196-5_10.

Assessment of Disordered Linker Predictions in the CAID2 Experiment.

Biomolecules. 2024 Feb 28;14(3):287. doi: 10.3390/biom14030287.

Systematic identification of 20S proteasome substrates.

Mol Syst Biol. 2024 Apr;20(4):403-427. doi: 10.1038/s44320-024-00015-y. Epub 2024 Jan 29.

Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins.

Comput Struct Biotechnol J. 2023 Jun 2;21:3248-3258. doi: 10.1016/j.csbj.2023.06.001. eCollection 2023.

DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model.

BMC Biol. 2024 Jan 2;22(1):3. doi: 10.1186/s12915-023-01803-y.

RNet: a network strategy to predict RNA binding preferences.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad482.

本文引用的文献

Molecular recognition features (MoRFs) in three domains of life.

Mol Biosyst. 2016 Mar;12(3):697-710. doi: 10.1039/c5mb00640f.

High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder.

Nucleic Acids Res. 2015 Oct 15;43(18):e121. doi: 10.1093/nar/gkv585. Epub 2015 Jun 24.

Computational identification of MoRFs in protein sequences.

Bioinformatics. 2015 Jun 1;31(11):1738-44. doi: 10.1093/bioinformatics/btv060. Epub 2015 Jan 30.

The InterPro protein families database: the classification resource after 15 years.

Nucleic Acids Res. 2015 Jan;43(Database issue):D213-21. doi: 10.1093/nar/gku1243. Epub 2014 Nov 26.

DISOPRED3: precise disordered region predictions with annotated protein-binding activity.

Bioinformatics. 2015 Mar 15;31(6):857-63. doi: 10.1093/bioinformatics/btu744. Epub 2014 Nov 12.

Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life.

Cell Mol Life Sci. 2015 Jan;72(1):137-51. doi: 10.1007/s00018-014-1661-9. Epub 2014 Jun 18.

The DynaMine webserver: predicting protein dynamics from sequence.

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W264-70. doi: 10.1093/nar/gku270. Epub 2014 Apr 11.

Intrinsically disordered proteins and intrinsically disordered protein regions.

Annu Rev Biochem. 2014;83:553-84. doi: 10.1146/annurev-biochem-072711-164947. Epub 2014 Mar 5.

A four-amino acid linker between repeats in the α-synuclein sequence is important for fibril formation.

Biochemistry. 2014 Jan 21;53(2):279-81. doi: 10.1021/bi401427t. Epub 2014 Jan 8.

Pfam: the protein families database.

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DFLpred：蛋白质序列中无序柔性连接区的高通量预测

DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences.

作者信息

Meng Fanchi, Kurgan Lukasz

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada.

Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, U.S.A.