Xia Zhiliang, Ma Shiqiang, Li Jiawei, Guo Yan, Jiang Limin, Tang Jijun
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.
Department of Public Health Sciences, University of Miami, Miami, FL 33136, United States.
Bioinform Adv. 2024 Nov 4;4(1):vbae163. doi: 10.1093/bioadv/vbae163. eCollection 2024.
Protein function prediction is crucial in bioinformatics, driven by the growth of protein sequence data from high-throughput technologies. Traditional methods are costly and slow, underscoring the need for computational solutions. While deep learning offers powerful tools, many models lack optimization for brain development datasets, critical for neurodevelopmental disorder research. To address this, we developed RecGOBD (Recognition of Gene Ontology-related Brain Development protein function), a model tailored to predict protein functions essential to brain development.
RecGOBD targets 10 key gene ontology (GO) terms for brain development, embedding protein sequences associated with these terms. Leveraging advanced pre-trained models, it captures both sequence and structure data, aligning them with GO terms through attention mechanisms. The category attention layer enhances prediction accuracy. RecGOBD surpassed five benchmark models in AUROC, AUPR, and Fmax metrics and was further used to predict autism-related protein functions and assess mutation impacts on GO terms. These findings highlight RecGOBD's potential in advancing protein function prediction for neurodevelopmental disorders.
All Python codes associated with this study are available at https://github.com/ZL-Xia/RECGOBD.git.
随着高通量技术产生的蛋白质序列数据不断增长,蛋白质功能预测在生物信息学中至关重要。传统方法成本高昂且速度缓慢,这凸显了对计算解决方案的需求。虽然深度学习提供了强大的工具,但许多模型缺乏针对脑发育数据集的优化,而脑发育数据集对于神经发育障碍研究至关重要。为了解决这一问题,我们开发了RecGOBD(基因本体相关脑发育蛋白质功能识别),这是一种专门用于预测对脑发育至关重要的蛋白质功能的模型。
RecGOBD针对脑发育的10个关键基因本体(GO)术语,嵌入与这些术语相关的蛋白质序列。利用先进的预训练模型,它捕获序列和结构数据,并通过注意力机制将它们与GO术语对齐。类别注意力层提高了预测准确性。RecGOBD在AUROC、AUPR和Fmax指标上超过了五个基准模型,并进一步用于预测与自闭症相关的蛋白质功能以及评估突变对GO术语的影响。这些发现突出了RecGOBD在推进神经发育障碍蛋白质功能预测方面的潜力。
与本研究相关的所有Python代码可在https://github.com/ZL-Xia/RECGOBD.git上获取。