Tawfiq Rund, Niu Kexin, Hoehndorf Robert, Kulmanov Maxat
KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
Biological and Environmental Sciences & Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
Sci Rep. 2024 Dec 30;14(1):31813. doi: 10.1038/s41598-024-82956-w.
Analyzing microbial samples remains computationally challenging due to their diversity and complexity. The lack of robust de novo protein function prediction methods exacerbates the difficulty in deriving functional insights from these samples. Traditional prediction methods, dependent on homology and sequence similarity, often fail to predict functions for novel proteins and proteins without known homologs. Moreover, most of these methods have been trained on largely eukaryotic data, and have not been evaluated on or applied to microbial datasets. This research introduces DeepGOMeta, a deep learning model designed for protein function prediction as Gene Ontology (GO) terms, trained on a dataset relevant to microbes. The model is applied to diverse microbial datasets to demonstrate its use for gaining biological insights. Data and code are available at https://github.com/bio-ontology-research-group/deepgometa.
由于微生物样本的多样性和复杂性,对其进行分析在计算上仍然具有挑战性。缺乏强大的从头蛋白质功能预测方法加剧了从这些样本中获取功能见解的难度。依赖同源性和序列相似性的传统预测方法通常无法预测新蛋白质和没有已知同源物的蛋白质的功能。此外,这些方法大多是在主要为真核生物的数据上进行训练的,尚未在微生物数据集上进行评估或应用。本研究引入了DeepGOMeta,这是一种深度学习模型,设计用于将蛋白质功能预测为基因本体(GO)术语,并在与微生物相关的数据集上进行训练。该模型应用于各种微生物数据集,以展示其在获取生物学见解方面的用途。数据和代码可在https://github.com/bio-ontology-research-group/deepgometa上获取。