• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于可迁移性的蛋白质表示学习评估方法。

A Transferability-Based Method for Evaluating the Protein Representation Learning.

作者信息

Hu Fan, Zhang Weihong, Huang Huazhen, Li Wang, Li Yang, Yin Peng

出版信息

IEEE J Biomed Health Inform. 2024 May;28(5):3158-3166. doi: 10.1109/JBHI.2024.3370680. Epub 2024 May 6.

DOI:10.1109/JBHI.2024.3370680
PMID:38416611
Abstract

Self-supervised pre-trained language models have recently risen as a powerful approach in learning protein representations, showing exceptional effectiveness in various biological tasks, such as drug discovery. Amidst the evolving trend in protein language model development, there is an observable shift towards employing large-scale multimodal and multitask models. However, the predominant reliance on empirical assessments using specific benchmark datasets for evaluating these models raises concerns about the comprehensiveness and efficiency of current evaluation methods. Addressing this gap, our study introduces a novel quantitative approach for estimating the performance of transferring multi-task pre-trained protein representations to downstream tasks. This transferability-based method is designed to quantify the similarities in latent space distributions between pre-trained features and those fine-tuned for downstream tasks. It encompasses a broad spectrum, covering multiple domains and a variety of heterogeneous tasks. To validate this method, we constructed a diverse set of protein-specific pre-training tasks. The resulting protein representations were then evaluated across several downstream biological tasks. Our experimental results demonstrate a robust correlation between the transferability scores obtained using our method and the actual transfer performance observed. This significant correlation highlights the potential of our method as a more comprehensive and efficient tool for evaluating protein representation learning.

摘要

自监督预训练语言模型最近作为一种学习蛋白质表征的强大方法兴起,在各种生物学任务(如药物发现)中显示出卓越的有效性。在蛋白质语言模型发展的不断演变趋势中,存在一种向采用大规模多模态和多任务模型的明显转变。然而,主要依赖使用特定基准数据集进行实证评估来评价这些模型,引发了对当前评估方法的全面性和效率的担忧。为弥补这一差距,我们的研究引入了一种新颖的定量方法,用于估计将多任务预训练蛋白质表征转移到下游任务的性能。这种基于可转移性的方法旨在量化预训练特征与为下游任务微调的特征在潜在空间分布上的相似性。它涵盖范围广泛,包括多个领域和各种异构任务。为验证此方法,我们构建了一组多样的蛋白质特定预训练任务。然后在几个下游生物学任务中评估所得的蛋白质表征。我们的实验结果表明,使用我们的方法获得的可转移性分数与观察到的实际转移性能之间存在强相关性。这种显著的相关性突出了我们的方法作为评估蛋白质表征学习的更全面、高效工具的潜力。

相似文献

1
A Transferability-Based Method for Evaluating the Protein Representation Learning.一种基于可迁移性的蛋白质表示学习评估方法。
IEEE J Biomed Health Inform. 2024 May;28(5):3158-3166. doi: 10.1109/JBHI.2024.3370680. Epub 2024 May 6.
2
A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks.一种用于量化跨生化下游任务可转移性的多模态蛋白质表示框架。
Adv Sci (Weinh). 2023 Aug;10(22):e2301223. doi: 10.1002/advs.202301223. Epub 2023 May 30.
3
Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks.蛋白质中的迁移学习:评估生物信息学任务中新型蛋白质学习表示。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac232.
4
Transferability of artificial neural networks for clinical document classification across hospitals: A case study on abnormality detection from radiology reports.医院间临床文档分类的人工神经网络可转移性:以放射学报告异常检测为例的研究。
J Biomed Inform. 2018 Sep;85:68-79. doi: 10.1016/j.jbi.2018.07.017. Epub 2018 Jul 17.
5
Large-scale benchmarking and boosting transfer learning for medical image analysis.用于医学图像分析的大规模基准测试与增强迁移学习
Med Image Anal. 2025 May;102:103487. doi: 10.1016/j.media.2025.103487. Epub 2025 Feb 21.
6
TransAnno-Net: A Deep Learning Framework for Accurate Cell Type Annotation of Mouse Lung Tissue Using Self-supervised Pretraining.TransAnno-Net:一种使用自监督预训练对小鼠肺组织进行准确细胞类型注释的深度学习框架。
Comput Methods Programs Biomed. 2025 Jul;267:108809. doi: 10.1016/j.cmpb.2025.108809. Epub 2025 Apr 24.
7
GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.通用 DTA:结合预训练和多任务学习,预测未知药物发现的药物-靶标结合亲和力。
BMC Bioinformatics. 2022 Sep 7;23(1):367. doi: 10.1186/s12859-022-04905-6.
8
Inferring latent task structure for Multitask Learning by Multiple Kernel Learning.通过多核学习推断多任务学习中的潜在任务结构。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S5. doi: 10.1186/1471-2105-11-S8-S5.
9
Generalized and transferable patient language representation for phenotyping with limited data.用于有限数据表型分析的通用且可转移的患者语言表示
J Biomed Inform. 2021 Apr;116:103726. doi: 10.1016/j.jbi.2021.103726. Epub 2021 Mar 9.
10
A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction.一种通过普罗克汝斯分析和均值漂移进行癌症药物敏感性预测的迁移学习方法。
J Bioinform Comput Biol. 2018 Jun;16(3):1840014. doi: 10.1142/S0219720018400140.