• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

IDP⁻CRF:基于条件随机场的无序蛋白/区域识别。

IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.

出版信息

Int J Mol Sci. 2018 Aug 22;19(9):2483. doi: 10.3390/ijms19092483.

DOI:10.3390/ijms19092483
PMID:30135358
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6164615/
Abstract

Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP⁻CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP⁻CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP⁻CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP⁻CRF will facilitate the development of protein sequence analysis.

摘要

准确预测无规卷曲蛋白质/区域是生物信息学中最重要的任务之一,已经提出了一些计算预测器来解决这个问题。如何有效地整合序列顺序效应对于构建准确的预测器至关重要,因为无规区域分布显示全局序列模式。为了捕捉这些序列模式,已经将几种序列标记模型应用于该领域,例如条件随机场 (CRFs)。然而,这些方法存在某些缺点。在这项研究中,我们提出了一种名为 IDP⁻CRF 的新计算预测器,它是在基于 MobiDB 数据库和 DisProt 数据库的更新基准数据集上进行训练的,并结合了更全面的基于序列的特征,包括 PSSMs(位置特异性评分矩阵)、kmer、预测的二级结构和相对溶剂可及性。在基准数据集和两个独立数据集上的实验结果表明,IDP⁻CRF 在该领域的 25 种现有最先进方法中表现出色,表明 IDP⁻CRF 是识别 IDPs/IDRs(无规卷曲蛋白质/区域)的非常有用的工具。我们预计 IDP⁻CRF 将促进蛋白质序列分析的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/7c2e0a3e72ef/ijms-19-02483-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/f86cf83bc666/ijms-19-02483-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/17f9750048cb/ijms-19-02483-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/42cf99f63588/ijms-19-02483-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/7c2e0a3e72ef/ijms-19-02483-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/f86cf83bc666/ijms-19-02483-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/17f9750048cb/ijms-19-02483-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/42cf99f63588/ijms-19-02483-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/7c2e0a3e72ef/ijms-19-02483-g004.jpg

相似文献

1
IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields.IDP⁻CRF:基于条件随机场的无序蛋白/区域识别。
Int J Mol Sci. 2018 Aug 22;19(9):2483. doi: 10.3390/ijms19092483.
2
Predicting Protein-Protein Interfaces that Bind Intrinsically Disordered Protein Regions.预测与无序蛋白区域结合的蛋白-蛋白界面。
J Mol Biol. 2019 Aug 9;431(17):3157-3178. doi: 10.1016/j.jmb.2019.06.010. Epub 2019 Jun 15.
3
MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles.MoRFPred-plus:利用物理化学性质和隐马尔可夫模型轮廓对蛋白质序列中的分子识别特征进行计算识别
J Theor Biol. 2018 Jan 21;437:9-16. doi: 10.1016/j.jtbi.2017.10.015. Epub 2017 Oct 16.
4
A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction.全面综述和比较现有的用于预测无规则卷曲蛋白质和区域的计算方法。
Brief Bioinform. 2019 Jan 18;20(1):330-346. doi: 10.1093/bib/bbx126.
5
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields.基于条件随机场的长度依赖性预测器识别内在无序蛋白质及区域
Mol Ther Nucleic Acids. 2019 Sep 6;17:396-404. doi: 10.1016/j.omtn.2019.06.004. Epub 2019 Jun 15.
6
cnnAlpha: Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks.cnnAlpha:通过简化氨基酸字母表和卷积神经网络进行蛋白质无序区域预测
Proteins. 2020 Nov;88(11):1472-1481. doi: 10.1002/prot.25966. Epub 2020 Aug 7.
7
DISOPRED3: precise disordered region predictions with annotated protein-binding activity.DISOPRED3:具有注释蛋白质结合活性的精确无序区域预测
Bioinformatics. 2015 Mar 15;31(6):857-63. doi: 10.1093/bioinformatics/btu744. Epub 2014 Nov 12.
8
MoRFPred_en: Sequence-based prediction of MoRFs using an ensemble learning strategy.MoRFPred_en:使用集成学习策略基于序列预测莫尔费(MoRFs)。
J Bioinform Comput Biol. 2019 Dec;17(6):1940015. doi: 10.1142/S0219720019400158.
9
Do sequence neighbours of intrinsically disordered regions promote structural flexibility in intrinsically disordered proteins?序列相邻的无规则区域是否会促进无规则蛋白质的结构灵活性?
J Struct Biol. 2020 Feb 1;209(2):107428. doi: 10.1016/j.jsb.2019.107428. Epub 2019 Nov 20.
10
RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins.RFPR-IDP:通过同时纳入完全有序的蛋白质和无序的蛋白质,降低内在无序蛋白质和区域预测的假阳性率。
Brief Bioinform. 2021 Mar 22;22(2):2000-2011. doi: 10.1093/bib/bbaa018.

引用本文的文献

1
FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion.融合编码器:基于多特征融合的内在无序区域识别
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf362.
2
IDP-EDL: enhancing intrinsically disordered protein prediction by combining protein language model and ensemble deep learning.IDP-EDL:通过结合蛋白质语言模型和集成深度学习增强内在无序蛋白质预测
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf182.
3
FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking.

本文引用的文献

1
A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction.全面综述和比较现有的用于预测无规则卷曲蛋白质和区域的计算方法。
Brief Bioinform. 2019 Jan 18;20(1):330-346. doi: 10.1093/bib/bbx126.
2
StackDPPred: a stacking based prediction of DNA-binding protein from sequence.StackDPPred:一种基于堆叠的 DNA 结合蛋白序列预测方法。
Bioinformatics. 2019 Feb 1;35(3):433-441. doi: 10.1093/bioinformatics/bty653.
3
iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites.
FusOn-pLM:一种通过调整速率掩码的融合癌蛋白特异性语言模型。
Nat Commun. 2025 Feb 7;16(1):1436. doi: 10.1038/s41467-025-56745-6.
4
Special Protein or RNA Molecules Computational Identification.特殊蛋白质或 RNA 分子的计算鉴定。
Int J Mol Sci. 2023 Jul 11;24(14):11312. doi: 10.3390/ijms241411312.
5
Deep learning in prediction of intrinsic disorder in proteins.深度学习在蛋白质内在无序预测中的应用
Comput Struct Biotechnol J. 2022 Mar 8;20:1286-1294. doi: 10.1016/j.csbj.2022.03.003. eCollection 2022.
6
BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.BioSeq-BLM:一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。
Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.
7
RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins.RFPR-IDP:通过同时纳入完全有序的蛋白质和无序的蛋白质,降低内在无序蛋白质和区域预测的假阳性率。
Brief Bioinform. 2021 Mar 22;22(2):2000-2011. doi: 10.1093/bib/bbaa018.
8
BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.BioSeq-Analysis2.0:一个基于机器学习方法的更新平台,用于在序列水平和残基水平上分析 DNA、RNA 和蛋白质序列。
Nucleic Acids Res. 2019 Nov 18;47(20):e127. doi: 10.1093/nar/gkz740.
9
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields.基于条件随机场的长度依赖性预测器识别内在无序蛋白质及区域
Mol Ther Nucleic Acids. 2019 Sep 6;17:396-404. doi: 10.1016/j.omtn.2019.06.004. Epub 2019 Jun 15.
10
Gene2vec: gene subsequence embedding for prediction of mammalian -methyladenosine sites from mRNA.Gene2vec:基于基因子序列的嵌体模型,用于从 mRNA 预测哺乳动物 m6A 修饰位点。
RNA. 2019 Feb;25(2):205-218. doi: 10.1261/rna.069112.118. Epub 2018 Nov 13.
iProt-Sub:一个全面的软件包,用于准确地映射和预测蛋白酶特异性底物和切割位点。
Brief Bioinform. 2019 Mar 25;20(2):638-658. doi: 10.1093/bib/bby028.
4
iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC.iRO-3wPseKNC:通过三窗口 PseKNC 识别 DNA 复制起点。
Bioinformatics. 2018 Sep 15;34(18):3086-3093. doi: 10.1093/bioinformatics/bty312.
5
CoABind: a novel algorithm for Coenzyme A (CoA)- and CoA derivatives-binding residues prediction.CoABind:一种用于辅酶 A(CoA)和 CoA 衍生物结合残基预测的新算法。
Bioinformatics. 2018 Aug 1;34(15):2598-2604. doi: 10.1093/bioinformatics/bty162.
6
PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework.PREvaIL,一种基于机器学习框架,使用序列、结构和网络特征推断催化残基的综合方法。
J Theor Biol. 2018 Apr 14;443:125-137. doi: 10.1016/j.jtbi.2018.01.023. Epub 2018 Feb 1.
7
BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches.生物序列分析:一个基于机器学习方法的 DNA、RNA 和蛋白质序列分析平台。
Brief Bioinform. 2019 Jul 19;20(4):1280-1294. doi: 10.1093/bib/bbx165.
8
Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains.全面综述与蛋白链中 DNA、RNA 和蛋白质结合残基特征的实证分析
Brief Bioinform. 2019 Jul 19;20(4):1250-1268. doi: 10.1093/bib/bbx168.
9
ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank.ProtDec-LTR2.0:一种通过结合伪蛋白质和有监督学习排序来改进蛋白质远程同源性检测的方法。
Bioinformatics. 2017 Nov 1;33(21):3473-3476. doi: 10.1093/bioinformatics/btx429.
10
Constructing prediction models from expression profiles for large scale lncRNA-miRNA interaction profiling.从表达谱构建大规模 lncRNA-miRNA 相互作用预测模型。
Bioinformatics. 2018 Mar 1;34(5):812-819. doi: 10.1093/bioinformatics/btx672.