Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland.
Institute of Computer Science, Johannes Gutenberg University Mainz, 55128 Mainz, Germany.
Environ Sci Process Impacts. 2017 Mar 22;19(3):449-464. doi: 10.1039/c6em00697c.
Developing models for the prediction of microbial biotransformation pathways and half-lives of trace organic contaminants in different environments requires as training data easily accessible and sufficiently large collections of respective biotransformation data that are annotated with metadata on study conditions. Here, we present the Eawag-Soil package, a public database that has been developed to contain all freely accessible regulatory data on pesticide degradation in laboratory soil simulation studies for pesticides registered in the EU (282 degradation pathways, 1535 reactions, 1619 compounds and 4716 biotransformation half-life values with corresponding metadata on study conditions). We provide a thorough description of this novel data resource, and discuss important features of the pesticide soil degradation data that are relevant for model development. Most notably, the variability of half-life values for individual compounds is large and only about one order of magnitude lower than the entire range of median half-life values spanned by all compounds, demonstrating the need to consider study conditions in the development of more accurate models for biotransformation prediction. We further show how the data can be used to find missing rules relevant for predicting soil biotransformation pathways. From this analysis, eight examples of reaction types were presented that should trigger the formulation of new biotransformation rules, e.g., Ar-OH methylation, or the extension of existing rules, e.g., hydroxylation in aliphatic rings. The data were also used to exemplarily explore the dependence of half-lives of different amide pesticides on chemical class and experimental parameters. This analysis highlighted the value of considering initial transformation reactions for the development of meaningful quantitative-structure biotransformation relationships (QSBR), which is a novel opportunity offered by the simultaneous encoding of transformation reactions and corresponding half-lives in Eawag-Soil. Overall, Eawag-Soil provides an unprecedentedly rich collection of manually extracted and curated biotransformation data, which should be useful in a great variety of applications.
开发用于预测痕量有机污染物在不同环境中的微生物转化途径和半衰期的模型,需要易于获取的训练数据,以及经过注释的、包含研究条件元数据的大量相应的生物转化数据。在这里,我们介绍了 Eawag-Soil 数据库,这是一个公共数据库,旨在包含所有可自由获取的关于欧盟注册农药在实验室土壤模拟研究中降解的监管数据(282 条降解途径、1535 个反应、1619 种化合物和 4716 种生物转化半衰期值,以及相应的研究条件元数据)。我们对这个新的数据资源进行了详细描述,并讨论了与模型开发相关的农药土壤降解数据的重要特征。值得注意的是,个别化合物半衰期值的可变性很大,仅比所有化合物的中位数半衰期值范围低一个数量级左右,这表明在开发更准确的生物转化预测模型时,需要考虑研究条件。我们进一步展示了如何使用这些数据来发现与预测土壤生物转化途径相关的缺失规则。从这个分析中,提出了 8 种反应类型的例子,这些例子应该可以触发新的生物转化规则的制定,例如 Ar-OH 甲基化,或者扩展现有的规则,例如脂肪环的羟化。还使用这些数据来示例性地探索不同酰胺类农药的半衰期对化学类别和实验参数的依赖性。这种分析强调了考虑初始转化反应对于开发有意义的定量结构-生物转化关系(QSBR)的重要性,这是 Eawag-Soil 同时编码转化反应及其相应半衰期所提供的一个新机会。总体而言,Eawag-Soil 提供了一个前所未有的丰富的手动提取和精心整理的生物转化数据集合,应该在各种应用中都很有用。