Lunghini Filippo, Gilles Marcou, Azam Philippe, Enrici Marie-Hélène, Van Miert Erik, Varnek Alexandre
Laboratory of Chemoinformatics - UMR7140, University of Strasbourg, 4 Rue Blaise Pascal, 67081, Strasbourg, France.
Toxicological and Environmental Risk Assessment unit, Solvay S.A., 85, avenue des Frères Perret, 69192, St. Fons, France.
Mol Inform. 2021 Apr;40(4):e2000232. doi: 10.1002/minf.202000232. Epub 2020 Nov 24.
In the framework of REACH (Registration Evaluation Authorization and restriction of Chemicals) regulation, industries have generated and reported a huge amount of (eco)toxicological data on substance produced or imported in Europe. The registration procedure initiated the creation of a large REACH database of well defined (eco)toxicological properties. Here, the data distribution in the REACH chemical space was analyzed with the help of the Generative Topographic Mapping (GTM) approach. GTM generates 2-dimensional maps on which each compound is represented as a data point. The 3 dimension can be used in order to display a distribution of the given (eco)toxicological property, which can further be used for property assessment of new compounds projected on the map. We report the "Universal REACH map" which accommodates 11 endpoints, covering environmental fate and (eco)toxicological properties. This map demonstrates acceptable predictive performance: in cross-validation, balanced accuracy ranges from 0.60 to 0.78. The 11 endpoints profile has been computed for each REACH-registered substance. Some concerns related to acute aquatic toxicity have been identified, whereas for environmental fate and human health endpoints the amount of compounds predicted as of concern was much smaller. It has been demonstrated that superposition of several class landscapes allows to select the zones in the chemical space populated by compounds with a given (eco)toxicological profile.
在《化学品注册、评估、授权和限制》(REACH)法规框架下,各行业已生成并报告了大量关于在欧洲生产或进口物质的(生态)毒理学数据。注册程序启动了一个包含明确(生态)毒理学特性的大型REACH数据库的创建。在此,借助生成地形映射(GTM)方法分析了REACH化学空间中的数据分布。GTM生成二维图,每个化合物在图上表示为一个数据点。可以使用三维来显示给定(生态)毒理学特性的分布,这可进一步用于对映射上新化合物的特性评估。我们报告了容纳11个端点的“通用REACH图”,涵盖环境归宿和(生态)毒理学特性。该图展示了可接受的预测性能:在交叉验证中,平衡准确率范围为0.60至0.78。已为每个REACH注册物质计算了11个端点概况。已识别出一些与急性水生毒性相关的问题,而对于环境归宿和人类健康端点,预测为有问题的化合物数量要少得多。已证明叠加多个类别景观可在化学空间中选择由具有给定(生态)毒理学概况的化合物占据的区域。