Varghese Dana Mary, Athulya T, Mohani Vikash K, Ahmad Shandar
School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
Methods Mol Biol. 2025;2941:201-225. doi: 10.1007/978-1-0716-4623-6_13.
Protein function prediction from sequence, structure, gene expression profiles, and published literature are needed to understand all biological processes. Natural language processing of biological text and large language model (LLM)-based encoding of sequence and structure opens powerful paths to rapid function annotation and novel training models. In this survey, we take a look at the available models for function prediction, especially the NLP- and LLM-based models. The survey highlights the major advances made and the ground that still needs to be covered to automate the process of function prediction from two major sources namely protein sequences and published research documents.
为了理解所有生物过程,需要从序列、结构、基因表达谱和已发表文献中预测蛋白质功能。对生物文本进行自然语言处理以及基于大语言模型(LLM)对序列和结构进行编码,为快速功能注释和新型训练模型开辟了强大的途径。在本次综述中,我们审视了现有的功能预测模型,尤其是基于自然语言处理和大语言模型的模型。该综述突出了已取得的主要进展以及在从蛋白质序列和已发表研究文献这两个主要来源实现功能预测过程自动化方面仍需涵盖的领域。
J Am Med Inform Assoc. 2024-10-1
J Am Med Inform Assoc. 2025-3-1
J Med Internet Res. 2025-6-19
Cell Syst. 2023-11-15
Front Bioinform. 2023-7-27
Nucleic Acids Res. 2023-7-5
Trends Biochem Sci. 2023-4