Zhu Jianshen, Azam Naveed Ahmed, Cao Shengjuan, Ido Ryota, Haraguchi Kazuya, Zhao Liang, Nagamochi Hiroshi, Akutsu Tatsuya
Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
Department of Mathematics, Quaid-i-Azam University, Islamabad, Pakistan.
Front Genet. 2025 Jan 29;15:1483490. doi: 10.3389/fgene.2024.1483490. eCollection 2024.
Compound inference models are crucial for discovering novel drugs in bioinformatics and chemo-informatics. These models rely heavily on useful descriptors of chemical compounds that effectively capture important information about the underlying compounds for constructing accurate prediction functions. In this article, we introduce quadratic descriptors, the products of two graph-theoretic descriptors, to enhance the learning performance of a novel two-layered compound inference model. A mixed-integer linear programming formulation is designed to approximate these quadratic descriptors for inferring desired compounds with the two-layered model. Furthermore, we introduce different methods to reduce descriptors, aiming to avoid computational complexity and overfitting issues during the learning process caused by the large number of quadratic descriptors. Experimental results show that for 32 chemical properties of monomers and 10 chemical properties of polymers, the prediction functions constructed by the proposed method achieved high test coefficients of determination. Furthermore, our method inferred chemical compounds in a time ranging from a few seconds to approximately 60 s. These results indicate a strong correlation between the properties of chemical graphs and their quadratic graph-theoretic descriptors.
复合推理模型对于生物信息学和化学信息学中发现新型药物至关重要。这些模型严重依赖于化学化合物的有用描述符,这些描述符能有效捕获有关基础化合物的重要信息,以构建准确的预测函数。在本文中,我们引入二次描述符,即两个图论描述符的乘积,以提高新型两层复合推理模型的学习性能。设计了一种混合整数线性规划公式来近似这些二次描述符,以便用两层模型推断所需化合物。此外,我们引入了不同的方法来减少描述符,旨在避免学习过程中因大量二次描述符导致的计算复杂性和过拟合问题。实验结果表明,对于单体的32种化学性质和聚合物的10种化学性质,所提方法构建的预测函数实现了较高的测试决定系数。此外,我们的方法推断化学化合物的时间从几秒到大约60秒不等。这些结果表明化学图的性质与其二次图论描述符之间存在很强的相关性。