文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

以自然语言处理(NLP)和大语言模型(LLM)为重点的生物功能预测方法综述。

A Survey of Biological Function Prediction Methods with Focus on Natural Language Processing (NLP) and Large Language Models (LLM).

作者信息

Varghese Dana Mary, Athulya T, Mohani Vikash K, Ahmad Shandar

机构信息

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.

出版信息

Methods Mol Biol. 2025;2941:201-225. doi: 10.1007/978-1-0716-4623-6_13.


DOI:10.1007/978-1-0716-4623-6_13
PMID:40601260
Abstract

Protein function prediction from sequence, structure, gene expression profiles, and published literature are needed to understand all biological processes. Natural language processing of biological text and large language model (LLM)-based encoding of sequence and structure opens powerful paths to rapid function annotation and novel training models. In this survey, we take a look at the available models for function prediction, especially the NLP- and LLM-based models. The survey highlights the major advances made and the ground that still needs to be covered to automate the process of function prediction from two major sources namely protein sequences and published research documents.

摘要

为了理解所有生物过程,需要从序列、结构、基因表达谱和已发表文献中预测蛋白质功能。对生物文本进行自然语言处理以及基于大语言模型(LLM)对序列和结构进行编码,为快速功能注释和新型训练模型开辟了强大的途径。在本次综述中,我们审视了现有的功能预测模型,尤其是基于自然语言处理和大语言模型的模型。该综述突出了已取得的主要进展以及在从蛋白质序列和已发表研究文献这两个主要来源实现功能预测过程自动化方面仍需涵盖的领域。

相似文献

[1]
A Survey of Biological Function Prediction Methods with Focus on Natural Language Processing (NLP) and Large Language Models (LLM).

Methods Mol Biol. 2025

[2]
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.

J Am Med Inform Assoc. 2024-10-1

[3]
Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins.

Methods Mol Biol. 2025

[4]
Comparing traditional natural language processing and large language models for mental health status classification: a multi-model evaluation.

Sci Rep. 2025-7-6

[5]
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

J Med Internet Res. 2025-6-11

[6]
A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025-3-1

[7]
Large Language Model Architectures in Health Care: Scoping Review of Research Perspectives.

J Med Internet Res. 2025-6-19

[8]
Application of unified health large language model evaluation framework to In-Basket message replies: bridging qualitative and quantitative assessments.

J Am Med Inform Assoc. 2025-4-1

[9]
Using Natural Language Processing to Explore Patient Perspectives on AI Avatars in Support Materials for Patients With Breast Cancer: Survey Study.

J Med Internet Res. 2025-6-20

[10]
Ontology accelerates few-shot learning capability of large language model: A study in extraction of drug efficacy in a rare pediatric epilepsy.

Int J Med Inform. 2025-9

本文引用的文献

[1]
LMPTMSite: A Platform for PTM Site Prediction in Proteins Leveraging Transformer-Based Protein Language Models.

Methods Mol Biol. 2025

[2]
CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds.

J Mol Biol. 2024-9-1

[3]
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.

Nat Methods. 2024-8

[4]
ProGen2: Exploring the boundaries of protein language models.

Cell Syst. 2023-11-15

[5]
Domain-PFP allows protein function prediction using function-aware domain embedding representations.

Commun Biol. 2023-10-31

[6]
Current successes and remaining challenges in protein function prediction.

Front Bioinform. 2023-7-27

[7]
DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction.

Nucleic Acids Res. 2023-7-5

[8]
LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model.

Glycobiology. 2023-6-3

[9]
pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model.

BMC Bioinformatics. 2023-2-8

[10]
Novel machine learning approaches revolutionize protein knowledge.

Trends Biochem Sci. 2023-4

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索