Department of Science, Faculty of Science, Yamagata University, 1-4-12 Kojirakawa, Yamagata 990-8560, Japan.
Molecules. 2021 Dec 7;26(24):7428. doi: 10.3390/molecules26247428.
The blood-brain barrier (BBB) controls the entry of chemicals from the blood to the brain. Since brain drugs need to penetrate the BBB, rapid and reliable prediction of BBB penetration (BBBP) is helpful for drug development. In this study, free-form and in-blood-form datasets were prepared by modifying the original BBBP dataset, and the effects of the data modification were investigated. For each dataset, molecular descriptors were generated and used for BBBP prediction by machine learning (ML). For ML, the dataset was split into training, validation, and test data by the scaffold split algorithm MoleculeNet used. This creates an unbalanced split and makes the prediction difficult; however, we decided to use that algorithm to evaluate the predictive performance for unknown compounds dissimilar to existing ones. The highest prediction score was obtained by the random forest model using 212 descriptors from the free-form dataset, and this score was higher than the existing best score using the same split algorithm without using any external database. Furthermore, using a deep neural network, a comparable result was obtained with only 11 descriptors from the free-form dataset, and the resulting descriptors suggested the importance of recognizing the glucose-like characteristics in BBBP prediction.
血脑屏障 (BBB) 控制着血液中的化学物质进入大脑。由于脑药物需要穿透 BBB,因此快速可靠地预测 BBB 穿透 (BBBP) 有助于药物开发。在这项研究中,通过修改原始 BBBP 数据集来准备自由格式和血液形式的数据集,并研究了数据修改的效果。对于每个数据集,生成分子描述符,并通过机器学习 (ML) 用于 BBBP 预测。对于 ML,数据集通过使用 MoleculeNet 的支架拆分算法拆分为训练、验证和测试数据。这会造成不平衡的拆分,使预测变得困难;但是,我们决定使用该算法来评估对与现有化合物不相似的未知化合物的预测性能。使用自由格式数据集的 212 个描述符的随机森林模型获得了最高的预测分数,并且该分数高于使用相同拆分算法但不使用任何外部数据库的现有最佳分数。此外,使用深度神经网络,仅从自由格式数据集使用 11 个描述符即可获得可比的结果,并且得到的描述符表明在 BBBP 预测中识别葡萄糖样特征的重要性。