Malwe Aditya S, Longwani Usha, Sharma Vineet K
MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, 462066, India.
NAR Genom Bioinform. 2025 May 1;7(2):lqaf037. doi: 10.1093/nargab/lqaf037. eCollection 2025 Jun.
Application of machine learning-based methods to identify novel bacterial enzymes capable of degrading a wide range of xenobiotics offers enormous potential for bioremediation of toxic and carcinogenic recalcitrant xenobiotics such as pesticides, plastics, petroleum, and pharmacological products that adversely impact ecology and health. Using 6814 diverse substrates involved in ∼141 200 biochemical reactions, we have developed 'XenoBug', a machine learning-based tool that predicts bacterial enzymes, enzymatic reaction, the species capable of biodegrading xenobiotics, and the metagenomic source of the predicted enzymes. For training, a hybrid feature set was used that comprises 1603 molecular descriptors and linear and circular fingerprints. It also includes enzyme datasets consisting of ∼3.3 million enzyme sequences derived from an environmental metagenome database and ∼16 million enzymes from ∼38 000 bacterial genomes. For different reaction classes, XenoBug shows very high binary accuracies (>0.75) and F1 scores (>0.62). XenoBug is also validated on a set of diverse classes of xenobiotics such as pesticides, environmental pollutants, pharmacological products, and hydrocarbons known to be degraded by the bacterial enzymes. XenoBug predicted known as well as previously unreported metabolic enzymes for the degradation of molecules in the validation set, thus showing its broad utility to predict the metabolism of any input xenobiotic molecules. XenoBug is available on: https://metabiosys.iiserb.ac.in/xenobug.
应用基于机器学习的方法来识别能够降解多种异生素的新型细菌酶,为生物修复有毒和致癌的难降解异生素(如农药、塑料、石油和对生态与健康有不利影响的药品)提供了巨大潜力。利用涉及约141200个生化反应的6814种不同底物,我们开发了“XenoBug”,这是一种基于机器学习的工具,可预测细菌酶、酶促反应、能够生物降解异生素的物种以及预测酶的宏基因组来源。为了进行训练,使用了一个混合特征集,其中包括1603个分子描述符以及线性和环状指纹。它还包括酶数据集,该数据集由来自环境宏基因组数据库的约330万个酶序列和约38000个细菌基因组的约1600万个酶组成。对于不同的反应类别,XenoBug显示出非常高的二元准确率(>0.75)和F1分数(>0.62)。XenoBug还在一组不同类别的异生素(如已知可被细菌酶降解的农药、环境污染物、药品和碳氢化合物)上进行了验证。XenoBug预测了验证集中分子降解的已知以及先前未报告的代谢酶,从而显示出其在预测任何输入异生素分子代谢方面的广泛用途。可通过以下网址获取XenoBug:https://metabiosys.iiserb.ac.in/xenobug。