• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

21 世纪的自动基因功能预测。

Automatic Gene Function Prediction in the 2020's.

机构信息

Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands.

Keygene N.V., 6708PW Wageningen, The Netherlands.

出版信息

Genes (Basel). 2020 Oct 27;11(11):1264. doi: 10.3390/genes11111264.

DOI:10.3390/genes11111264
PMID:33120976
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7692357/
Abstract

The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field for decades and has made considerable progress in that time. However, it is certainly not solved. In this paper, we describe challenges that the AFP field still has to overcome in the future to increase its applicability. The challenges we consider are how to: (1) include condition-specific functional annotation, (2) predict functions for non-model species, (3) include new informative data sources, (4) deal with the biases of Gene Ontology (GO) annotations, and (5) maximally exploit the GO to obtain performance gains. We also provide recommendations for addressing those challenges, by adapting (1) the way we represent proteins and genes, (2) the way we represent gene functions, and (3) the algorithms that perform the prediction from gene to function. Together, we show that AFP is still a vibrant research area that can benefit from continuing advances in machine learning with which AFP in the 2020s can again take a large step forward reinforcing the power of computational biology.

摘要

目前,新的 DNA 和蛋白质序列的生成速度太快,以至于无法通过实验发现这些序列的功能,这强调了需要准确的自动功能预测 (AFP) 方法。几十年来,AFP 一直是一个活跃且不断发展的研究领域,并在这段时间取得了相当大的进展。然而,它肯定还没有解决。在本文中,我们描述了 AFP 领域未来仍需克服的挑战,以提高其适用性。我们认为的挑战包括如何:(1)包含特定条件的功能注释,(2)预测非模式物种的功能,(3)包含新的信息数据来源,(4)处理基因本体论 (GO) 注释的偏差,以及(5)最大限度地利用 GO 以获得性能提升。我们还通过适应(1)我们表示蛋白质和基因的方式,(2)我们表示基因功能的方式,以及(3)执行从基因到功能预测的算法,为解决这些挑战提供了建议。总之,我们表明,AFP 仍然是一个充满活力的研究领域,可以从机器学习的持续进步中受益,在 2020 年代,AFP 可以再次迈出一大步,增强计算生物学的力量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/fb9343478c45/genes-11-01264-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/616a7358a3ec/genes-11-01264-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/baa72c96d99c/genes-11-01264-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/aa9d3e560039/genes-11-01264-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/3c0c7ea82f63/genes-11-01264-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/fb22403ceba4/genes-11-01264-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/fb9343478c45/genes-11-01264-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/616a7358a3ec/genes-11-01264-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/baa72c96d99c/genes-11-01264-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/aa9d3e560039/genes-11-01264-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/3c0c7ea82f63/genes-11-01264-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/fb22403ceba4/genes-11-01264-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb16/7692357/fb9343478c45/genes-11-01264-g006.jpg

相似文献

1
Automatic Gene Function Prediction in the 2020's.21 世纪的自动基因功能预测。
Genes (Basel). 2020 Oct 27;11(11):1264. doi: 10.3390/genes11111264.
2
Community-Wide Evaluation of Computational Function Prediction.计算功能预测的全社区评估
Methods Mol Biol. 2017;1446:133-146. doi: 10.1007/978-1-4939-3743-1_10.
3
GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank.GOLabeler:通过学习排序提高基于序列的大规模蛋白质功能预测。
Bioinformatics. 2018 Jul 15;34(14):2465-2473. doi: 10.1093/bioinformatics/bty130.
4
Computational Methods for Annotation Transfers from Sequence.从序列进行注释转移的计算方法。
Methods Mol Biol. 2017;1446:55-67. doi: 10.1007/978-1-4939-3743-1_5.
5
Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.Hum-mPLoc 3.0:通过对基因本体和功能域特征的隐藏相关性进行建模来增强人类蛋白质亚细胞定位预测
Bioinformatics. 2017 Mar 15;33(6):843-853. doi: 10.1093/bioinformatics/btw723.
6
Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations.新型指标提高预测基因本体论注释的优先级。
IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):954-965. doi: 10.1109/TCBB.2017.2695459. Epub 2017 Apr 18.
7
Best Practices in Manual Annotation with the Gene Ontology.使用基因本体进行人工注释的最佳实践
Methods Mol Biol. 2017;1446:41-54. doi: 10.1007/978-1-4939-3743-1_4.
8
Evaluating Computational Gene Ontology Annotations.评估计算基因本体注释
Methods Mol Biol. 2017;1446:97-109. doi: 10.1007/978-1-4939-3743-1_8.
9
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.利用 PPI 网络自相关性在层次多标签分类树中进行基因功能预测。
BMC Bioinformatics. 2013 Sep 26;14:285. doi: 10.1186/1471-2105-14-285.
10
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.

引用本文的文献

1
Functional profiling of the sequence stockpile: a protein pair-based assessment of in silico prediction tools.序列储备的功能分析:基于蛋白质对的计算机预测工具评估
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf035.
2
An experimental analysis of graph representation learning for Gene Ontology based protein function prediction.基于基因本体论的蛋白质功能预测的图表示学习的实验分析。
PeerJ. 2024 Nov 14;12:e18509. doi: 10.7717/peerj.18509. eCollection 2024.
3
GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation.

本文引用的文献

1
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
2
Evaluating Protein Transfer Learning with TAPE.使用TAPE评估蛋白质迁移学习。
Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.
3
Colocality to Cofunctionality: Eukaryotic Gene Neighborhoods as a Resource for Function Discovery.共定位到共功能:真核生物基因邻域作为功能发现的资源
GORetriever:通过基于文献的深度信息检索对基于蛋白质描述的 GO 候选物进行重新排序,用于蛋白质功能注释。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii53-ii61. doi: 10.1093/bioinformatics/btae401.
4
Machine learning-aided design and screening of an emergent protein function in synthetic cells.机器学习辅助设计和筛选合成细胞中的新兴蛋白质功能。
Nat Commun. 2024 Mar 5;15(1):2010. doi: 10.1038/s41467-024-46203-0.
5
An informatic workflow for the enhanced annotation of excretory/secretory proteins of .一种用于增强对……的排泄/分泌蛋白注释的信息学工作流程。 (原文中“of”后面似乎缺失了具体内容)
Comput Struct Biotechnol J. 2023 Mar 18;21:2696-2704. doi: 10.1016/j.csbj.2023.03.025. eCollection 2023.
6
SynBioTools: a one-stop facility for searching and selecting synthetic biology tools.SynBioTools:一站式搜索和选择合成生物学工具的平台。
BMC Bioinformatics. 2023 Apr 17;24(1):152. doi: 10.1186/s12859-023-05281-5.
7
Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence.利用大数据和人工智能阐明原核蛋白的功能作用。
FEMS Microbiol Rev. 2023 Jan 16;47(1). doi: 10.1093/femsre/fuad003.
8
Functional characterization of prokaryotic dark matter: the road so far and what lies ahead.原核生物暗物质的功能表征:迄今为止的进展与未来展望。
Curr Res Microb Sci. 2022 Aug 7;3:100159. doi: 10.1016/j.crmicr.2022.100159. eCollection 2022.
9
Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习
Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.
10
CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation.CrowdGO:基于机器学习和语义相似性的共识基因本体论注释。
PLoS Comput Biol. 2022 May 13;18(5):e1010075. doi: 10.1371/journal.pcbi.1010075. eCollection 2022 May.
Mol Biol Evol. 2021 Jan 23;38(2):650-662. doi: 10.1093/molbev/msaa221.
4
Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.无监督蛋白质嵌入在预测分子功能方面优于手工制作的序列和结构特征。
Bioinformatics. 2021 Apr 19;37(2):162-170. doi: 10.1093/bioinformatics/btaa701.
5
Predictive features of gene expression variation reveal mechanistic link with differential expression.表达基因变异的预测特征揭示了与差异表达的机制联系。
Mol Syst Biol. 2020 Aug;16(8):e9539. doi: 10.15252/msb.20209539.
6
Expanded encyclopaedias of DNA elements in the human and mouse genomes.人类和小鼠基因组中 DNA 元件的扩展百科全书。
Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29.
7
Benchmarking gene ontology function predictions using negative annotations.利用负注释进行基因本体论功能预测的基准测试。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i210-i218. doi: 10.1093/bioinformatics/btaa466.
8
The proteome landscape of the kingdoms of life.生命王国的蛋白质组全景
Nature. 2020 Jun;582(7813):592-596. doi: 10.1038/s41586-020-2402-x. Epub 2020 Jun 17.
9
A network-based computational framework to predict and differentiate functions for gene isoforms using exon-level expression data.基于网络的计算框架,使用外显子水平表达数据预测和区分基因异构体的功能。
Methods. 2021 May;189:54-64. doi: 10.1016/j.ymeth.2020.06.005. Epub 2020 Jun 10.
10
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.