Hirota Keisuke, Salim Felix, Yamada Takuji
School of Life Science and Technology, Institute of Science Tokyo, Tokyo, 152-8550, Japan.
Metagen, Inc., Yamagata, 997-0052, Japan.
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf053.
Progress in sequencing technology has led to determination of large numbers of protein sequences, and large enzyme databases are now available. Although many computational tools for enzyme annotation were developed, sequence information is unavailable for many enzymes, known as orphan enzymes. These orphan enzymes hinder sequence similarity-based functional annotation, leading gaps in understanding the association between sequences and enzymatic reactions.
Therefore, we developed DeepES, a deep learning-based tool for enzyme screening to identify orphan enzyme genes, focusing on biosynthetic gene clusters and reaction class. DeepES uses protein sequences as inputs and evaluates whether the input genes contain biosynthetic gene clusters of interest by integrating the outputs of the binary classifier for each reaction class. The validation results suggested that DeepES can capture functional similarity between protein sequences, and it can be implemented to explore orphan enzyme genes. By applying DeepES to 4744 metagenome-assembled genomes, we identified candidate genes for 236 orphan enzymes, including those involved in short-chain fatty acid production as a characteristic pathway in human gut bacteria.
DeepES is available at https://github.com/yamada-lab/DeepES. Model weights and the candidate genes are available at Zenodo (https://doi.org/10.5281/zenodo.11123900).
测序技术的进步使得大量蛋白质序列得以确定,现在已有大型酶数据库。尽管开发了许多用于酶注释的计算工具,但许多酶(即孤儿酶)的序列信息仍然无法获取。这些孤儿酶阻碍了基于序列相似性的功能注释,导致在理解序列与酶促反应之间的关联方面存在空白。
因此,我们开发了DeepES,这是一种基于深度学习的酶筛选工具,用于识别孤儿酶基因,重点关注生物合成基因簇和反应类别。DeepES将蛋白质序列作为输入,并通过整合每个反应类别的二元分类器的输出,评估输入基因是否包含感兴趣的生物合成基因簇。验证结果表明,DeepES能够捕捉蛋白质序列之间的功能相似性,并且可用于探索孤儿酶基因。通过将DeepES应用于4744个宏基因组组装基因组,我们鉴定出了236种孤儿酶的候选基因,包括那些参与短链脂肪酸生成(作为人类肠道细菌中的一种特征性途径)的基因。
DeepES可在https://github.com/yamada-lab/DeepES获取。模型权重和候选基因可在Zenodo(https://doi.org/10.5281/zenodo.11123900)获取。