Suppr超能文献

使用蛋白质语言模型解码模式生物中的功能蛋白质组信息。

Decoding functional proteome information in model organisms using protein language models.

作者信息

Barrios-Núñez Israel, Martínez-Redondo Gemma I, Medina-Burgos Patricia, Cases Ildefonso, Fernández Rosa, Rojas Ana M

机构信息

Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain.

Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain.

出版信息

NAR Genom Bioinform. 2024 Jul 2;6(3):lqae078. doi: 10.1093/nargab/lqae078. eCollection 2024 Sep.

Abstract

Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use.

摘要

蛋白质语言模型在经过整理的数据集上进行测试时已被证明是可靠的,但尚未应用于完整蛋白质组。因此,我们测试了两种基于机器学习的不同方法在从选定模式生物的蛋白质组中解码功能信息时的表现。我们发现,对于所有测试物种以及所研究的三个基因本体,蛋白质语言模型比深度学习方法更精确且信息更丰富,并且它们能更好地从转录组实验中恢复功能信息。所获得的结果表明,这些语言模型可能适用于大规模注释和下游分析,并且我们推荐了一份使用指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8169/11217674/afa1fcd95dfd/lqae078figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验