Chen Cheng, Wang Ledu, Feng Yi, Yao Wencheng, Liu Jiahe, Jiang Zifan, Zhao Luyuan, Zhang Letian, Jiang Jun, Feng Shuo
State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China Hefei Anhui 230026 China
MOE Key Laboratory of Resources and Environmental System Optimization, College of Environmental Science and Engineering, North China Electric Power University Beijing 102206 China.
Chem Sci. 2025 Mar 13;16(15):6355-6365. doi: 10.1039/d5sc00451a. eCollection 2025 Apr 9.
Machine learning models have emerged as powerful tools for drug discovery of lead compounds. Nevertheless, despite notable advances in model architectures, research on more reliable and physicochemical-based descriptors for molecules and proteins remains limited. To address this gap, we introduce the Fragment Integral Spectrum Descriptor (FISD), aimed at utilizing the spatial configuration and electronic structure information of molecules and proteins, as a novel physicochemical descriptor for virtual screening models. Validation demonstrates that the combination of FISD and a classical neural network model achieves performance comparable to that of complex models paired with conventional structural descriptors. Furthermore, we successfully predict and screen potential binding ligands for two given protein targets, showcasing the broad applicability and practicality of FISD. This research enriches the molecular and protein representation strategies of machine learning and accelerates the process of drug discovery.
机器学习模型已成为发现先导化合物的强大工具。然而,尽管模型架构取得了显著进展,但关于更可靠且基于物理化学的分子和蛋白质描述符的研究仍然有限。为了弥补这一差距,我们引入了片段积分光谱描述符(FISD),旨在利用分子和蛋白质的空间构型和电子结构信息,作为虚拟筛选模型的一种新型物理化学描述符。验证表明,FISD与经典神经网络模型的结合所取得的性能与复杂模型与传统结构描述符配对时相当。此外,我们成功地预测并筛选了两个给定蛋白质靶点的潜在结合配体,展示了FISD的广泛适用性和实用性。这项研究丰富了机器学习的分子和蛋白质表示策略,并加速了药物发现的进程。