Smajić Aljoša, Grandits Melanie, Ecker Gerhard F
Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria.
J Cheminform. 2022 Aug 13;14(1):54. doi: 10.1186/s13321-022-00635-2.
Machine learning (ML) models require an extensive, user-driven selection of molecular descriptors in order to learn from chemical structures to predict actives and inactives with a high reliability. In addition, privacy concerns often restrict the access to sufficient data, leading to models with a narrow chemical space. Therefore, we propose a framework of re-trainable models that can be transferred from one local instance to another, and further allow a less extensive descriptor selection. The models are shared via a Jupyter Notebook, allowing the evaluation and implementation of a broader chemical space by keeping most of the tunable parameters pre-defined. This enables the models to be updated in a decentralized, facile, and fast manner. Herein, the method was evaluated with six transporter datasets (BCRP, BSEP, OATP1B1, OATP1B3, MRP3, P-gp), which revealed the general applicability of this approach.
机器学习(ML)模型需要用户驱动广泛选择分子描述符,以便从化学结构中学习,从而高度可靠地预测活性和非活性物质。此外,隐私问题常常限制对足够数据的访问,导致模型的化学空间狭窄。因此,我们提出了一个可重新训练模型的框架,该框架可以从一个本地实例转移到另一个本地实例,并进一步允许进行不太广泛的描述符选择。这些模型通过Jupyter Notebook共享,通过预先定义大多数可调参数,允许对更广泛的化学空间进行评估和实施。这使得模型能够以分散、简便和快速的方式进行更新。在此,该方法用六个转运体数据集(BCRP、BSEP、OATP1B1、OATP1B3、MRP3、P-gp)进行了评估,结果表明了该方法的普遍适用性。