Suppr超能文献

RecGOBD:通过多特征融合和注意力机制准确识别与基因本体相关的脑发育蛋白质功能。

RecGOBD: accurate recognition of gene ontology related brain development protein functions through multi-feature fusion and attention mechanisms.

作者信息

Xia Zhiliang, Ma Shiqiang, Li Jiawei, Guo Yan, Jiang Limin, Tang Jijun

机构信息

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.

Department of Public Health Sciences, University of Miami, Miami, FL 33136, United States.

出版信息

Bioinform Adv. 2024 Nov 4;4(1):vbae163. doi: 10.1093/bioadv/vbae163. eCollection 2024.

Abstract

MOTIVATION

Protein function prediction is crucial in bioinformatics, driven by the growth of protein sequence data from high-throughput technologies. Traditional methods are costly and slow, underscoring the need for computational solutions. While deep learning offers powerful tools, many models lack optimization for brain development datasets, critical for neurodevelopmental disorder research. To address this, we developed RecGOBD (Recognition of Gene Ontology-related Brain Development protein function), a model tailored to predict protein functions essential to brain development.

RESULT

RecGOBD targets 10 key gene ontology (GO) terms for brain development, embedding protein sequences associated with these terms. Leveraging advanced pre-trained models, it captures both sequence and structure data, aligning them with GO terms through attention mechanisms. The category attention layer enhances prediction accuracy. RecGOBD surpassed five benchmark models in AUROC, AUPR, and Fmax metrics and was further used to predict autism-related protein functions and assess mutation impacts on GO terms. These findings highlight RecGOBD's potential in advancing protein function prediction for neurodevelopmental disorders.

AVAILABILITY AND IMPLEMENTATION

All Python codes associated with this study are available at https://github.com/ZL-Xia/RECGOBD.git.

摘要

动机

随着高通量技术产生的蛋白质序列数据不断增长,蛋白质功能预测在生物信息学中至关重要。传统方法成本高昂且速度缓慢,这凸显了对计算解决方案的需求。虽然深度学习提供了强大的工具,但许多模型缺乏针对脑发育数据集的优化,而脑发育数据集对于神经发育障碍研究至关重要。为了解决这一问题,我们开发了RecGOBD(基因本体相关脑发育蛋白质功能识别),这是一种专门用于预测对脑发育至关重要的蛋白质功能的模型。

结果

RecGOBD针对脑发育的10个关键基因本体(GO)术语,嵌入与这些术语相关的蛋白质序列。利用先进的预训练模型,它捕获序列和结构数据,并通过注意力机制将它们与GO术语对齐。类别注意力层提高了预测准确性。RecGOBD在AUROC、AUPR和Fmax指标上超过了五个基准模型,并进一步用于预测与自闭症相关的蛋白质功能以及评估突变对GO术语的影响。这些发现突出了RecGOBD在推进神经发育障碍蛋白质功能预测方面的潜力。

可用性和实现方式

与本研究相关的所有Python代码可在https://github.com/ZL-Xia/RECGOBD.git上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af35/11639192/b85f9843ce7d/vbae163f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验