Vorberg Susann, Tetko Igor V
Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany tel: +49-89-3187-3575; fax: +49-89-3187-3585.
Chemistry Department, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah 21589, Saudi Arabia.
Mol Inform. 2014 Jan;33(1):73-85. doi: 10.1002/minf.201300030. Epub 2013 Nov 28.
Biodegradability describes the capacity of substances to be mineralized by free-living bacteria. It is a crucial property in estimating a compound's long-term impact on the environment. The ability to reliably predict biodegradability would reduce the need for laborious experimental testing. However, this endpoint is difficult to model due to unavailability or inconsistency of experimental data. Our approach makes use of the Online Chemical Modeling Environment (OCHEM) and its rich supply of machine learning methods and descriptor sets to build classification models for ready biodegradability. These models were analyzed to determine the relationship between characteristic structural properties and biodegradation activity. The distinguishing feature of the developed models is their ability to estimate the accuracy of prediction for each individual compound. The models developed using seven individual descriptor sets were combined in a consensus model, which provided the highest accuracy. The identified overrepresented structural fragments can be used by chemists to improve the biodegradability of new chemical compounds. The consensus model, the datasets used, and the calculated structural fragments are publicly available at http://ochem.eu/article/31660.
生物降解性描述了物质被自由生活细菌矿化的能力。它是评估化合物对环境长期影响的关键属性。可靠预测生物降解性的能力将减少繁琐实验测试的需求。然而,由于实验数据不可用或不一致,这个终点很难建模。我们的方法利用在线化学建模环境(OCHEM)及其丰富的机器学习方法和描述符集来构建即时生物降解性的分类模型。对这些模型进行分析以确定特征结构性质与生物降解活性之间的关系。所开发模型的显著特点是它们能够估计每个单独化合物预测的准确性。使用七个单独描述符集开发的模型被组合成一个共识模型,该模型提供了最高的准确性。化学家可以使用识别出的过度代表的结构片段来提高新化合物的生物降解性。共识模型、所使用的数据集以及计算出的结构片段可在http://ochem.eu/article/31660上公开获取。