Singh Satnam, Zeh Gina, Freiherr Jessica, Bauer Thilo, Türkmen Isik, Grasskamp Andreas T
Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany.
Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany.
J Cheminform. 2024 Apr 16;16(1):45. doi: 10.1186/s13321-024-00835-y.
In this paper we present a method that allows leveraging 3D electron density information to train a deep neural network pipeline to segment regions of high, medium and low electronegativity and classify substances as health hazardous or non-hazardous. We show that this can be used for use-cases such as cosmetics and food products. For this purpose, we first generate 3D electron density cubes using semiempirical molecular calculations for a custom European Chemicals Agency (ECHA) subset consisting of substances labelled as hazardous and non-hazardous for cosmetic usage. Together with their 3-class electronegativity maps we train a modified 3D-UNet with electron density cubes to segment reactive sites in molecules and classify substances with an accuracy of 78.1%. We perform the same process on a custom food dataset (CompFood) consisting of hazardous and non-hazardous substances compiled from European Food Safety Authority (EFSA) OpenFoodTox, Food and Drug Administration (FDA) Generally Recognized as Safe (GRAS) and FooDB datasets to achieve a classification accuracy of 64.1%. Our results show that 3D electron densities and particularly masked electron densities, calculated by taking a product of original electron densities and regions of high and low electronegativity can be used to classify molecules for different use-cases and thus serve not only to guide safe-by-design product development but also aid in regulatory decisions. SCIENTIFIC CONTRIBUTION: We aim to contribute to the diverse 3D molecular representations used for training machine learning algorithms by showing that a deep learning network can be trained on 3D electron density representation of molecules. This approach has previously not been used to train machine learning models and it allows utilization of the true spatial domain of the molecule for prediction of properties such as their suitability for usage in cosmetics and food products and in future, to other molecular properties. The data and code used for training is accessible at https://github.com/s-singh-ivv/eDen-Substances .
在本文中,我们提出了一种方法,该方法能够利用三维电子密度信息来训练深度神经网络管道,以分割高、中、低电负性区域,并将物质分类为对健康有害或无害。我们表明,这可用于化妆品和食品等用例。为此,我们首先使用半经验分子计算为一个定制的欧洲化学品管理局(ECHA)子集生成三维电子密度立方体,该子集由标记为化妆品使用有害和无害的物质组成。连同它们的三类电负性图,我们使用电子密度立方体训练一个改进的三维U-Net,以分割分子中的反应位点,并对物质进行分类,准确率达到78.1%。我们对一个定制的食品数据集(CompFood)执行相同的过程,该数据集由从欧洲食品安全局(EFSA)的OpenFoodTox、美国食品药品监督管理局(FDA)的一般认为安全(GRAS)和FooDB数据集中汇编的有害和无害物质组成,以实现64.1%的分类准确率。我们的结果表明,三维电子密度,特别是通过将原始电子密度与高、低电负性区域相乘计算得到的掩码电子密度,可用于对不同用例的分子进行分类,从而不仅有助于指导按设计安全的产品开发,还有助于监管决策。科学贡献:我们旨在通过表明可以在分子的三维电子密度表示上训练深度学习网络,为用于训练机器学习算法的各种三维分子表示做出贡献。这种方法以前未用于训练机器学习模型,它允许利用分子的真实空间域来预测其性质,例如它们在化妆品和食品中的适用性,以及未来的其他分子性质。用于训练的数据和代码可在https://github.com/s-singh-ivv/eDen-Substances上获取。