Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China.
College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China.
J Chem Inf Model. 2021 Jun 28;61(6):2697-2705. doi: 10.1021/acs.jcim.0c01489. Epub 2021 May 19.
Determining the properties of chemical molecules is essential for screening candidates similar to a specific drug. These candidate molecules are further evaluated for their target binding affinities, side effects, target missing probabilities, etc. Conventional machine learning algorithms demonstrated satisfying prediction accuracies of molecular properties. A molecule cannot be directly loaded into a machine learning model, and a set of engineered features needs to be designed and calculated from a molecule. Such hand-crafted features rely heavily on the experiences of the investigating researchers. The concept of graph neural networks (GNNs) was recently introduced to describe the chemical molecules. The features may be automatically and objectively extracted from the molecules through various types of GNNs, e.g., GCN (graph convolution network), GGNN (gated graph neural network), DMPNN (directed message passing neural network), etc. However, the training of a stable GNN model requires a huge number of training samples and a large amount of computing power, compared with the conventional machine learning strategies. This study proposed the integrated framework XGraphBoost to extract the features using a GNN and build an accurate prediction model of molecular properties using the classifier XGBoost. The proposed framework XGraphBoost fully inherits the merits of the GNN-based automatic molecular feature extraction and XGBoost-based accurate prediction performance. Both classification and regression problems were evaluated using the framework XGraphBoost. The experimental results strongly suggest that XGraphBoost may facilitate the efficient and accurate predictions of various molecular properties. The source code is freely available to academic users at https://github.com/chenxiaowei-vincent/XGraphBoost.git.
确定化学分子的性质对于筛选与特定药物相似的候选物至关重要。这些候选分子将进一步评估其靶标结合亲和力、副作用、靶标缺失概率等。传统的机器学习算法在预测分子性质方面表现出了令人满意的精度。但分子不能直接加载到机器学习模型中,需要从分子中设计和计算一组工程特征。这些手工制作的特征严重依赖于研究人员的经验。最近引入了图神经网络 (GNN) 的概念来描述化学分子。可以通过各种类型的 GNN(例如 GCN(图卷积网络)、GGNN(门控图神经网络)、DMPNN(定向消息传递神经网络)等)从分子中自动和客观地提取特征。然而,与传统的机器学习策略相比,稳定的 GNN 模型的训练需要大量的训练样本和大量的计算能力。本研究提出了集成框架 XGraphBoost,使用 GNN 提取特征,并使用分类器 XGBoost 构建分子性质的精确预测模型。所提出的框架 XGraphBoost 充分继承了基于 GNN 的自动分子特征提取和基于 XGBoost 的精确预测性能的优点。该框架 XGraphBoost 评估了分类和回归问题。实验结果强烈表明,XGraphBoost 可以促进各种分子性质的高效和精确预测。源代码可在学术用户在 https://github.com/chenxiaowei-vincent/XGraphBoost.git 免费获取。
J Chem Inf Model. 2021-6-28
Brief Bioinform. 2022-7-18
Bioinformatics. 2022-4-12
IEEE Trans Vis Comput Graph. 2022-6
J Chem Inf Model. 2023-4-24
Bioinformatics. 2024-8-2
J Chem Inf Model. 2025-4-14
Brief Bioinform. 2024-9-23
Brief Bioinform. 2024-5-23
PeerJ Comput Sci. 2024-5-7