Cherednichenko Anton, Afonin Sergii, Babii Oleg, Voitsitskyi Taras, Stratiichuk Roman, Koleiev Ihor, Vozniak Volodymyr, Shevchuk Nazar, Ostrovsky Zakhar, Yesylevskyy Semen, Nafiiev Alan, Starosyla Serhii, Ulrich Anne S, Jirgensons Aigars, Komarov Igor V
Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.
Enamine Ltd, Kyiv, Ukraine.
Mol Inform. 2025 Jul;44(7):e70001. doi: 10.1002/minf.70001.
Prediction of biological activities of chemical compounds by the machine learning techniques in general and the neural networks (NNs) in particular, is usually based on the analysis of their binding to the target of interest. If such affinity data is not available, the ligand-based approaches can be used where the NN models are trained to assess similarity of compounds to those with known biological activity. Obviously, this approach only works well if the similarity between the training set and the evaluated molecules is sufficiently high. In the case of large and conformationally flexible organic compounds, the activity becomes dependent not only on chemical identity but also on the dynamics of molecular motions, which imposes significant challenges to existing approaches based on static structural 2D and 3D molecular descriptors. A prominent example of compounds, which are especially challenging for existing NN activity prediction techniques, are photoswitchable macrocyclic peptides containing a diarylethene "photoswitch" (DAE). These molecules exist in two isomeric forms with remarkably different biological activities, which are interconvertible by light of different wavelengths. Activity prediction models have to distinguish in this case not only between the different peptides but also between the photoisomers of the same peptide. In this work, we demonstrate that the features extracted from classical molecular dynamics (MD) trajectories are superior to conventional 2D or 3D descriptor-based features when used in activity prediction NN models of DAE-containing photoswitchable peptides. Using MD-derived features, we successfully created two NN models that predict activities of photoswitchable peptidomimetics, analogs of the natural peptidic antibiotic gramicidin S. The first model precisely predicts the cytotoxic activity of similar peptide analogs. The second model reliably predicts the differences in the biological activities of DAE photoisomers of the same peptide, even if the type of its activity differs from one in the training dataset. Our results demonstrate that accounting for MD-derived dynamic features allows generalizing the ligand-based activity prediction NN models to the cases of large and conformationally flexible molecules, which were previously considered intractable by this class of models.
一般而言,利用机器学习技术,特别是神经网络(NNs)预测化合物的生物活性,通常基于对其与目标靶点结合情况的分析。如果没有此类亲和力数据,可以使用基于配体的方法,即训练神经网络模型来评估化合物与具有已知生物活性的化合物之间的相似性。显然,只有当训练集与评估分子之间的相似度足够高时,这种方法才会有效。对于大型且构象灵活的有机化合物,其活性不仅取决于化学结构,还取决于分子运动的动力学,这给基于静态结构二维和三维分子描述符的现有方法带来了重大挑战。对于现有的神经网络活性预测技术而言,特别具有挑战性的一类化合物是含有二芳基乙烯“光开关”(DAE)的光开关大环肽。这些分子以两种具有显著不同生物活性的异构体形式存在,可通过不同波长的光相互转换。在这种情况下,活性预测模型不仅要区分不同的肽,还要区分同一肽的光异构体。在这项工作中,我们证明,从经典分子动力学(MD)轨迹中提取的特征,在用于含DAE的光开关肽的活性预测神经网络模型时,优于传统的基于二维或三维描述符的特征。利用MD衍生的特征,我们成功创建了两个神经网络模型,用于预测光开关拟肽(天然肽类抗生素短杆菌肽S的类似物)的活性。第一个模型精确预测了类似肽类似物的细胞毒性活性。第二个模型可靠地预测了同一肽的DAE光异构体在生物活性上的差异,即使其活性类型与训练数据集中的不同。我们的结果表明,考虑MD衍生的动态特征可以将基于配体的活性预测神经网络模型推广到大型且构象灵活的分子情况,而这类分子以前被认为是此类模型难以处理的。