Suppr超能文献

enviRule:一种端到端的系统,用于从环境污染物生物转化途径中自动提取反应模式。

enviRule: an end-to-end system for automatic extraction of reaction patterns from environmental contaminant biotransformation pathways.

机构信息

Department of Environmental Chemistry, Eawag, Dübendorf 8600, Switzerland.

Department of Chemistry, University of Zürich, Zürich 8057, Switzerland.

出版信息

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad407.

Abstract

MOTIVATION

Transformation products (TPs) of man-made chemicals, formed through microbially mediated transformation in the environment, can have serious adverse environmental effects, yet the analytical identification of TPs is challenging. Rule-based prediction tools are successful in predicting TPs, especially in environmental chemistry applications that typically have to rely on small datasets, by imparting the existing knowledge on enzyme-mediated biotransformation reactions. However, the rules extracted from biotransformation reaction databases usually face the issue of being over/under-generalized and are not flexible to be updated with new reactions.

RESULTS

We developed an automatic rule extraction tool called enviRule. It clusters biotransformation reactions into different groups based on the similarities of reaction fingerprints, and then automatically extracts and generalizes rules for each reaction group in SMARTS format. It optimizes the genericity of automatic rules against the downstream TP prediction task. Models trained with automatic rules outperformed the models trained with manually curated rules by 30% in the area under curve (AUC) scores. Moreover, automatic rules can be easily updated with new reactions, highlighting enviRule's strengths for both automatic extraction of optimized reactions rules and automated updating thereof.

AVAILABILITY AND IMPLEMENTATION

enviRule code is freely available at https://github.com/zhangky12/enviRule.

摘要

动机

人为化学品通过环境中介微生物转化形成的转化产物 (TPs) 可能对环境产生严重的不利影响,但分析鉴定 TPs 具有挑战性。基于规则的预测工具通过将酶介导的生物转化反应的现有知识赋予其中,在预测 TPs 方面非常成功,尤其是在通常必须依赖小数据集的环境化学应用中。然而,从生物转化反应数据库中提取的规则通常面临过度/欠泛化的问题,并且不灵活,无法用新的反应进行更新。

结果

我们开发了一种名为 enviRule 的自动规则提取工具。它根据反应指纹的相似性将生物转化反应聚类到不同的组中,然后以 SMARTS 格式自动提取和概括每个反应组的规则。它针对下游 TP 预测任务优化了自动规则的通用性。使用自动规则训练的模型在曲线下面积 (AUC) 评分方面比使用手动策展规则训练的模型高出 30%。此外,自动规则可以轻松地用新反应进行更新,突出了 enviRule 在自动提取优化反应规则和自动更新方面的优势。

可用性和实施

enviRule 代码可在 https://github.com/zhangky12/enviRule 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d7d/10322654/de7881060dd5/btad407f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验