84 Heukseok-ro, Dongjak-gu, Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea.
Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon 34114, Republic of Korea.
Bioinformatics. 2021 May 23;37(8):1135-1139. doi: 10.1093/bioinformatics/btaa918.
Identification of blood-brain barrier (BBB) permeability of a compound is a major challenge in neurotherapeutic drug discovery. Conventional approaches for BBB permeability measurement are expensive, time-consuming and labor-intensive. BBB permeability is associated with diverse chemical properties of compounds. However, BBB permeability prediction models have been developed using small datasets and limited features, which are usually not practical due to their low coverage of chemical diversity of compounds. Aim of this study is to develop a BBB permeability prediction model using a large dataset for practical applications. This model can be used for facilitated compound screening in the early stage of brain drug discovery.
A dataset of 7162 compounds with BBB permeability (5453 BBB+ and 1709 BBB-) was compiled from the literature, where BBB+ and BBB- denote BBB-permeable and non-permeable compounds, respectively. We trained a machine learning model based on Light Gradient Boosting Machine (LightGBM) algorithm and achieved an overall accuracy of 89%, an area under the curve (AUC) of 0.93, specificity of 0.77 and sensitivity of 0.93, when 10-fold cross-validation was performed. The model was further evaluated using 74 central nerve system compounds (39 BBB+ and 35 BBB-) obtained from the literature and showed an accuracy of 90%, sensitivity of 0.85 and specificity of 0.94. Our model outperforms over existing BBB permeability prediction models.
The prediction server is available at http://ssbio.cau.ac.kr/software/bbb.
鉴定化合物的血脑屏障(BBB)通透性是神经治疗药物发现的主要挑战。传统的 BBB 通透性测量方法昂贵、耗时且劳动密集。BBB 通透性与化合物的多种化学性质相关。然而,已经开发了使用小数据集和有限特征的 BBB 通透性预测模型,由于化合物化学多样性的覆盖率低,通常不太实用。本研究的目的是使用大型数据集开发一种 BBB 通透性预测模型,用于实际应用。该模型可用于在脑药物发现的早期阶段促进化合物筛选。
从文献中编译了一个包含 7162 种化合物的 BBB 通透性数据集(5453 种 BBB+和 1709 种 BBB-),其中 BBB+和 BBB-分别表示 BBB 通透性和非通透性化合物。我们基于 Light Gradient Boosting Machine(LightGBM)算法训练了一个机器学习模型,当进行 10 倍交叉验证时,该模型的总体准确率为 89%,曲线下面积(AUC)为 0.93,特异性为 0.77,敏感性为 0.93。该模型进一步使用文献中获得的 74 种中枢神经系统化合物(39 种 BBB+和 35 种 BBB-)进行评估,准确率为 90%,敏感性为 0.85,特异性为 0.94。我们的模型优于现有的 BBB 通透性预测模型。
预测服务器可在 http://ssbio.cau.ac.kr/software/bbb 上获得。