利用大数据和人工智能阐明原核蛋白的功能作用。

Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence.

机构信息

Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany.

Wellcome Trust Sanger Institute, Hinxton, Saffron Walden CB10 1RQ, United Kingdom.

出版信息

FEMS Microbiol Rev. 2023 Jan 16;47(1). doi: 10.1093/femsre/fuad003.

DOI:10.1093/femsre/fuad003

PMID:36725215

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9960493/

Abstract

Annotating protein sequences according to their biological functions is one of the key steps in understanding microbial diversity, metabolic potentials, and evolutionary histories. However, even in the best-studied prokaryotic genomes, not all proteins can be characterized by classical in vivo, in vitro, and/or in silico methods-a challenge rapidly growing alongside the advent of next-generation sequencing technologies and their enormous extension of 'omics' data in public databases. These so-called hypothetical proteins (HPs) represent a huge knowledge gap and hidden potential for biotechnological applications. Opportunities for leveraging the available 'Big Data' have recently proliferated with the use of artificial intelligence (AI). Here, we review the aims and methods of protein annotation and explain the different principles behind machine and deep learning algorithms including recent research examples, in order to assist both biologists wishing to apply AI tools in developing comprehensive genome annotations and computer scientists who want to contribute to this leading edge of biological research.

摘要

根据蛋白质的生物学功能对其进行注释是理解微生物多样性、代谢潜能和进化历史的关键步骤之一。然而，即使在研究得最好的原核基因组中，也并非所有蛋白质都可以通过经典的体内、体外和/或计算方法来进行特征描述——这一挑战随着下一代测序技术的出现以及公共数据库中“组学”数据的大量扩展而迅速加剧。这些所谓的假设蛋白质（HP）代表了巨大的知识空白和生物技术应用的潜在可能性。最近，随着人工智能（AI）的使用，利用可用“大数据”的机会迅速增加。在这里，我们回顾了蛋白质注释的目标和方法，并解释了机器学习和深度学习算法背后的不同原理，包括最近的研究实例，以帮助希望在开发全面基因组注释中应用 AI 工具的生物学家以及希望为这一生物学研究前沿做出贡献的计算机科学家。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8778/9960493/489a7c60e4e1/fuad003fig1a.jpg

相似文献

Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence.利用大数据和人工智能阐明原核蛋白的功能作用。

FEMS Microbiol Rev. 2023 Jan 16;47(1). doi: 10.1093/femsre/fuad003.

Rethinking Drug Repositioning and Development with Artificial Intelligence, Machine Learning, and Omics.利用人工智能、机器学习和组学重新思考药物重定位和开发。

OMICS. 2019 Nov;23(11):539-548. doi: 10.1089/omi.2019.0151. Epub 2019 Oct 25.

Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.蛋白质科学与人工智能相遇：跨领域的系统评价与生化荟萃分析

Front Bioeng Biotechnol. 2022 Jul 7;10:788300. doi: 10.3389/fbioe.2022.788300. eCollection 2022.

[Comprehensive re-annotation of protein-coding genes for prokaryotic genomes by Z-curve and similarity-based methods].[基于Z曲线和相似性方法对原核生物基因组蛋白质编码基因进行全面重新注释]

Yi Chuan. 2020 Jul 20;42(7):691-702. doi: 10.16288/j.yczz.20-022.

Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: What Do These Terms Mean and How Will They Impact Health Care?人工智能、机器学习、深度学习和认知计算：这些术语是什么意思，它们将如何影响医疗保健？

J Arthroplasty. 2018 Aug;33(8):2358-2361. doi: 10.1016/j.arth.2018.02.067. Epub 2018 Feb 27.

Opportunities and challenges in application of artificial intelligence in pharmacology.人工智能在药理学应用中的机遇与挑战。

Pharmacol Rep. 2023 Feb;75(1):3-18. doi: 10.1007/s43440-022-00445-1. Epub 2023 Jan 9.

Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review.人工智能在大数据时代在药物设计和发现中的应用：全面综述。

Mol Divers. 2021 Aug;25(3):1643-1664. doi: 10.1007/s11030-021-10237-z. Epub 2021 Jun 10.

Applications of Artificial Intelligence in Cardiology. The Future is Already Here.人工智能在心脏病学中的应用。未来已来。

Rev Esp Cardiol (Engl Ed). 2019 Dec;72(12):1065-1075. doi: 10.1016/j.rec.2019.05.014. Epub 2019 Oct 12.

Machine Learning and Artificial Intelligence: A Paradigm Shift in Big Data-Driven Drug Design and Discovery.机器学习和人工智能：大数据驱动的药物设计与发现的范式转变。

Curr Top Med Chem. 2022;22(20):1692-1727. doi: 10.2174/1568026622666220701091339.

引用本文的文献

Deciphering the proteome of K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins.解析K-12的蛋白质组：整合转录组学与机器学习以注释假设蛋白质。

Comput Struct Biotechnol J. 2025 Jul 24;27:3565-3578. doi: 10.1016/j.csbj.2025.07.036. eCollection 2025.

Expanding kinetoplastid genome annotation through protein structure comparison.通过蛋白质结构比较扩展动质体基因组注释

PLoS Pathog. 2025 Apr 21;21(4):e1013120. doi: 10.1371/journal.ppat.1013120. eCollection 2025 Apr.

Sustainable agriculture: leveraging microorganisms for a circular economy.可持续农业：利用微生物实现循环经济。

Appl Microbiol Biotechnol. 2024 Aug 30;108(1):452. doi: 10.1007/s00253-024-13294-0.

Frequent nonhomologous replacement of replicative helicase loaders by viruses in .病毒频繁非同源替换。中的复制解旋酶加载器

Proc Natl Acad Sci U S A. 2024 May 7;121(19):e2317954121. doi: 10.1073/pnas.2317954121. Epub 2024 Apr 29.

First shotgun metagenomics study of Juan de Fuca deep-sea sediments reveals distinct microbial communities above, within, between, and below sulfate methane transition zones.对胡安德富卡深海沉积物的首次鸟枪法宏基因组学研究揭示了硫酸盐-甲烷过渡带之上、之内、之间和之下不同的微生物群落。

Front Microbiol. 2023 Nov 20;14:1241810. doi: 10.3389/fmicb.2023.1241810. eCollection 2023.

Computational methods in glaucoma research: Current status and future outlook.青光眼研究中的计算方法：现状与展望。

Mol Aspects Med. 2023 Dec;94:101222. doi: 10.1016/j.mam.2023.101222. Epub 2023 Nov 3.

From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry.从基因型到表型：推断与食品工业相关的微生物特性的计算方法。

FEMS Microbiol Rev. 2023 Jul 5;47(4). doi: 10.1093/femsre/fuad030.

本文引用的文献

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

ProteInfer, deep neural networks for protein functional inference.蛋白推断，用于蛋白质功能推断的深度神经网络。

Elife. 2023 Feb 27;12:e80942. doi: 10.7554/eLife.80942.

Functional characterization of prokaryotic dark matter: the road so far and what lies ahead.原核生物暗物质的功能表征：迄今为止的进展与未来展望。

Curr Res Microb Sci. 2022 Aug 7;3:100159. doi: 10.1016/j.crmicr.2022.100159. eCollection 2022.

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习

Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation.CrowdGO：基于机器学习和语义相似性的共识基因本体论注释。

PLoS Comput Biol. 2022 May 13;18(5):e1010075. doi: 10.1371/journal.pcbi.1010075. eCollection 2022 May.

Ontology-aware deep learning enables ultrafast and interpretable source tracking among sub-million microbial community samples from hundreds of niches.本体感知深度学习使来自数百个生态位的数百万个微生物群落样本中的源追踪实现超快且可解释。

Genome Med. 2022 Apr 26;14(1):43. doi: 10.1186/s13073-022-01047-5.

PFmulDL: a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods.PFmulDL：一种通过整合多种深度学习方法实现多类别和多标签蛋白质功能注释的新策略。

Comput Biol Med. 2022 Jun;145:105465. doi: 10.1016/j.compbiomed.2022.105465. Epub 2022 Mar 28.

Using deep learning to annotate the protein universe.利用深度学习标注蛋白质宇宙。

Nat Biotechnol. 2022 Jun;40(6):932-937. doi: 10.1038/s41587-021-01179-w. Epub 2022 Feb 21.

PANDA2: protein function prediction using graph neural networks.PANDA2：使用图神经网络进行蛋白质功能预测

NAR Genom Bioinform. 2022 Feb 2;4(1):lqac004. doi: 10.1093/nargab/lqac004. eCollection 2022 Mar.

Assigning protein function from domain-function associations using DomFun.基于域-功能关联来分配蛋白质功能，使用 DomFun。

BMC Bioinformatics. 2022 Jan 15;23(1):43. doi: 10.1186/s12859-022-04565-6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用大数据和人工智能阐明原核蛋白的功能作用。

Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献