• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

融合编码器:基于多特征融合的内在无序区域识别

FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion.

作者信息

Liu Sicen, Chen Shutao, Bai Tao, Liu Bin

机构信息

SMBU-MSU-BIT Joint Laboratory on Bioinformatics and Engineering Biology, Shenzhen MSU-BIT University, Shenzhen, Guangdong 518172, China.

School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.

出版信息

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf362.

DOI:10.1093/bioinformatics/btaf362
PMID:40577786
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12231546/
Abstract

MOTIVATION

Intrinsic disorder regions (IDRs) play a significant role in diverse biological processes and are widely distributed in proteins. Thus, accurately predicting these regions is essential for analyzing protein structure and function. Amino acid feature extraction servers as a foundational process in the development of computational predictive models. Existing methods typically rely on traditional biological features (e.g. PSSM) or use pre-trained protein language models (PPLMs) to capture sequence semantic information, often resorting to straightforward feature concatenation. However, these approaches fail to capture the multi-semantic interactions between traditional biological features and PPLMs-based features.

RESULTS

In this study, we propose a method named FusionEncoder designed for the integration of traditional biological and PPLMs-based features of the protein. FusionEncoder is a fusion network built on a variant of long short-term memory (LSTM). We consider traditional biological features and PPLMs-based features to be two types of semantic inputs within a "multi-semantic" space. Traditional features are input into the cell state of the LSTM, while PPLMs-based features are fed into the input part. A fusion cell is then utilized to fuse these two types of features. This strategy leverages the capability of LSTM to encode long sequences, enhancing context-aware semantic learning of amino acid sequences. Finally, a transformer-based encoder layer is employed to predict the IDRs. Evaluation on four independent test datasets indicate that FusionEncoder obviously improves the accuracy of amino acid feature representation and achieves superior performance compared to the other existing methods.

AVAILABILITY AND IMPLEMENTATION

To facilitate accessibility for experimental researchers, a user-friendly and publicly available webserver for the FusionEncoder predictor has been deployed at http://bliulab.net/FusionEncoder/. FusionEncoder is expected to serve as a valuable tool for the accurate identification of IDRs.

摘要

动机

内在无序区域(IDRs)在多种生物过程中发挥着重要作用,且广泛分布于蛋白质中。因此,准确预测这些区域对于分析蛋白质结构和功能至关重要。氨基酸特征提取是计算预测模型开发中的一个基础过程。现有方法通常依赖传统生物学特征(如位置特异性得分矩阵,PSSM)或使用预训练的蛋白质语言模型(PPLMs)来捕捉序列语义信息,常常采用直接的特征拼接方式。然而,这些方法未能捕捉传统生物学特征与基于PPLMs的特征之间的多语义交互。

结果

在本研究中,我们提出了一种名为融合编码器(FusionEncoder)的方法,用于整合蛋白质的传统生物学特征和基于PPLMs的特征。FusionEncoder是一个基于长短期记忆(LSTM)变体构建的融合网络。我们将传统生物学特征和基于PPLMs的特征视为“多语义”空间中的两种语义输入。传统特征输入到LSTM的细胞状态中,而基于PPLMs的特征则输入到输入部分。然后利用一个融合单元来融合这两种特征。这种策略利用了LSTM对长序列进行编码的能力,增强了氨基酸序列的上下文感知语义学习。最后,采用基于Transformer的编码器层来预测IDRs。对四个独立测试数据集的评估表明,FusionEncoder明显提高了氨基酸特征表示的准确性,与其他现有方法相比性能更优。

可用性和实现

为方便实验研究人员使用,已在http://bliulab.net/FusionEncoder/ 部署了一个用户友好且公开可用的FusionEncoder预测器网络服务器。FusionEncoder有望成为准确识别IDRs的有价值工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/3a63fd0555fb/btaf362f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/18c6f59e68fa/btaf362f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/a9ed9b79efcc/btaf362f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/ac1599a82839/btaf362f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/3a63fd0555fb/btaf362f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/18c6f59e68fa/btaf362f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/a9ed9b79efcc/btaf362f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/ac1599a82839/btaf362f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e0/12231546/3a63fd0555fb/btaf362f4.jpg

相似文献

1
FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion.融合编码器:基于多特征融合的内在无序区域识别
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf362.
2
MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model.MoRF_ESM:基于深度变压器蛋白质语言模型预测无序蛋白质中的分子识别特征片段
J Bioinform Comput Biol. 2024 Apr;22(2):2450006. doi: 10.1142/S0219720024500069. Epub 2024 May 28.
3
iACP-DPNet: a dual-pooling causal dilated convolutional network for interpretable anticancer peptide identification.iACP-DPNet:一种用于可解释抗癌肽识别的双池因果扩张卷积网络。
Funct Integr Genomics. 2025 Jul 4;25(1):147. doi: 10.1007/s10142-025-01641-x.
4
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
5
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历:系统检索与综述
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
7
Structural dynamics of IDR interactions in human SFPQ and implications for liquid-liquid phase separation.人类SFPQ中IDR相互作用的结构动力学及其对液-液相分离的影响
Acta Crystallogr D Struct Biol. 2025 Jul 1;81(Pt 7):357-379. doi: 10.1107/S2059798325005303. Epub 2025 Jun 27.
8
Psychological interventions for adults who have sexually offended or are at risk of offending.针对有性犯罪行为或有性犯罪风险的成年人的心理干预措施。
Cochrane Database Syst Rev. 2012 Dec 12;12(12):CD007507. doi: 10.1002/14651858.CD007507.pub2.
9
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

本文引用的文献

1
PUNCH2: Explore the strategy for intrinsically disordered protein predictor.PUNCH2:探索内在无序蛋白质预测器的策略。
PLoS One. 2025 Mar 26;20(3):e0319208. doi: 10.1371/journal.pone.0319208. eCollection 2025.
2
Accurate RNA velocity estimation based on multibatch network reveals complex lineage in batch scRNA-seq data.基于多批次网络的准确RNA速度估计揭示了批次单细胞RNA测序数据中的复杂谱系。
BMC Biol. 2024 Dec 18;22(1):290. doi: 10.1186/s12915-024-02085-8.
3
flDPnn2: Accurate and Fast Predictor of Intrinsic Disorder in Proteins.
flDPnn2:一种准确快速预测蛋白质内无序的方法。
J Mol Biol. 2024 Sep 1;436(17):168605. doi: 10.1016/j.jmb.2024.168605. Epub 2024 May 8.
4
Identification of microbe-disease signed associations via multi-scale variational graph autoencoder based on signed message propagation.基于有向消息传播的多尺度变分图自动编码器识别微生物-疾病签名关联。
BMC Biol. 2024 Aug 15;22(1):172. doi: 10.1186/s12915-024-01968-0.
5
DR-BERT: A protein language model to annotate disordered regions.DR-BERT:一种用于注释无规则区域的蛋白质语言模型。
Structure. 2024 Aug 8;32(8):1260-1268.e3. doi: 10.1016/j.str.2024.04.010. Epub 2024 May 2.
6
Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings.深度序列到蛋白预测(Deep-STP):一种基于深度学习的方法,通过词嵌入来预测蛇毒蛋白。
Front Med (Lausanne). 2024 Jan 17;10:1291352. doi: 10.3389/fmed.2023.1291352. eCollection 2023.
7
DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model.DisoFLAG:基于图的互作蛋白语言模型准确预测蛋白质固有无序及其功能。
BMC Biol. 2024 Jan 2;22(1):3. doi: 10.1186/s12915-023-01803-y.
8
Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching.基于单细胞参考和领域自适应匹配的批量组织中细胞类型丰度的高精度估计。
Adv Sci (Weinh). 2024 Feb;11(7):e2306329. doi: 10.1002/advs.202306329. Epub 2023 Dec 10.
9
Accurately identifying hemagglutinin using sequence information and machine learning methods.使用序列信息和机器学习方法准确识别血凝素。
Front Med (Lausanne). 2023 Oct 31;10:1281880. doi: 10.3389/fmed.2023.1281880. eCollection 2023.
10
DeepDRP: Prediction of intrinsically disordered regions based on integrated view deep learning architecture from transformer-enhanced and protein information.DeepDRP:基于来自 Transformer 增强和蛋白质信息的集成视图深度学习架构预测无规则区域。
Int J Biol Macromol. 2023 Dec 31;253(Pt 6):127390. doi: 10.1016/j.ijbiomac.2023.127390. Epub 2023 Oct 11.