Suppr超能文献

基于互注释的蛋白质结构域功能预测与 Domain2GO。

Mutual annotation-based prediction of protein domain functions with Domain2GO.

机构信息

Biological Data Science Lab, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.

Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey.

出版信息

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

Abstract

Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.

摘要

确定蛋白质未知的功能特性对于理解它们在健康和疾病状态下的作用至关重要。蛋白质的结构域组成可以在这种情况下提供关键信息,因为结构域是决定蛋白质在分子水平上应如何作用的结构和功能单元。昂贵且耗时的湿实验室实验方法促使研究人员开发计算策略来预测蛋白质的功能。在本研究中,我们提出了一种名为 Domain2GO 的新方法,该方法可以推断蛋白质结构域与功能定义的基因本体 (GO) 术语之间的关联,从而将问题重新定义为结构域功能预测。Domain2GO 使用有文献记录的蛋白质水平 GO 注释以及蛋白质的结构域注释。通过统计重采样检查同一蛋白质中结构域和 GO 术语的共同注释模式,以获得可靠的关联。作为案例研究,我们通过文献综述评估了从 Domain2GO 生成的结构域-GO 术语映射中选择的示例的生物学相关性。然后,我们通过将与结构域相关的 GO 术语传播到用这些结构域注释的蛋白质上来应用 Domain2GO 预测未知蛋白质的功能。为了进行功能预测性能评估和与其他方法进行比较,我们使用了关键功能注释评估 3 (CAFA3)挑战数据集。结果表明,Domain2GO 具有很高的潜力,特别是在预测分子功能和生物学过程术语方面,并且具有产生可解释结果和极低计算成本等优势。此处提出的方法可以扩展到其他本体和生物实体,以研究复杂和大规模生物数据中的未知关系。Domain2GO 的源代码、数据集、结果和用户说明可在 https://github.com/HUBioDataLab/Domain2GO 上获得。此外,我们还在 https://huggingface.co/spaces/HUBioDataLab/Domain2GO 上提供了一个用户友好的在线工具,该工具仅使用氨基酸序列即可简化对以前未注释蛋白质的功能预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df8d/11099699/6bfce5e22fa6/PRO-33-e4988-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验