Suppr超能文献

通过融合多种生物学模式注释蛋白质功能。

Annotating protein functions via fusing multiple biological modalities.

作者信息

Ma Wenjian, Bi Xiangpeng, Jiang Huasen, Wei Zhiqiang, Zhang Shugang

机构信息

College of Computer Science and Technology, Ocean University of China, Qingdao, China.

出版信息

Commun Biol. 2024 Dec 27;7(1):1705. doi: 10.1038/s42003-024-07411-y.

Abstract

Understanding the function of proteins is of great significance for revealing disease pathogenesis and discovering new targets. Benefiting from the explosive growth of the protein universal, deep learning has been applied to accelerate the protein annotation cycle from different biological modalities. However, most existing deep learning-based methods not only fail to effectively fuse different biological modalities, resulting in low-quality protein representations, but also suffer from the convergence of suboptimal solution caused by sparse label representations. Aiming at the above issue, we propose a multiprocedural approach for fusing heterogeneous biological modalities and annotating protein functions, i.e., MIF2GO (Multimodal Information Fusion to infer Gene Ontology terms), which sequentially fuses up to six biological modalities ranging from different biological levels in three steps, thus leading to powerful protein representations. Evaluation results on seven benchmark datasets show that the proposed method not only considerably outperforms state-of-the-art performance, but also demonstrates great robustness and generalizability across species. Besides, we also present biological insights into the associations between those modalities and protein functions. This research provides a robust framework for integrating multimodal biological data, offering a scalable solution for protein function annotation, ultimately facilitating advancements in precision medicine and the discovery of novel therapeutic strategies.

摘要

了解蛋白质的功能对于揭示疾病发病机制和发现新靶点具有重要意义。受益于蛋白质通用数据的爆炸式增长,深度学习已被应用于从不同生物模态加速蛋白质注释周期。然而,大多数现有的基于深度学习的方法不仅未能有效融合不同的生物模态,导致蛋白质表征质量低下,而且还受到稀疏标签表征导致的次优解收敛问题的困扰。针对上述问题,我们提出了一种用于融合异构生物模态和注释蛋白质功能的多步骤方法,即MIF2GO(多模态信息融合以推断基因本体术语),该方法分三步依次融合多达六种来自不同生物水平的生物模态,从而生成强大的蛋白质表征。在七个基准数据集上的评估结果表明,所提出的方法不仅大大优于当前的最佳性能,而且在跨物种方面表现出很强的鲁棒性和通用性。此外,我们还展示了这些模态与蛋白质功能之间关联的生物学见解。本研究为整合多模态生物数据提供了一个强大的框架,为蛋白质功能注释提供了一个可扩展的解决方案,最终促进精准医学的发展和新型治疗策略的发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b9c/11681170/b38f0be6febf/42003_2024_7411_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验