Yang Hongbin, Li Xiao, Cai Yingchun, Wang Qin, Li Weihua, Liu Guixia, Tang Yun
Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . Email:
Medchemcomm. 2017 Mar 29;8(6):1225-1234. doi: 10.1039/c7md00074j. eCollection 2017 Jun 1.
Chemical subcellular localization is closely related to drug distribution in the body and hence important in drug discovery and design. Although many and methods have been developed, methods play key roles in the prediction of chemical subcellular localization due to their low costs and high performance. For that purpose, machine learning-based methods were developed here. At first, 614 unique compounds localized in the lysosome, mitochondria, nucleus and plasma membrane were collected from the literature. 80% of the compounds were used to build the models and the rest as the external validation set. Both fingerprints and molecular descriptors were used to describe the molecules, and six machine learning methods were applied to build the multi-classification models. The performance of the models was measured by 5-fold cross-validation and external validation. We further detected key substructures for each localization and analyzed potential structure-localization relationships, which could be very helpful for molecular design and modification. The key substructures can also be used as features complementary to fingerprints to improve the performance of the models.
化学亚细胞定位与药物在体内的分布密切相关,因此在药物发现和设计中很重要。尽管已经开发了许多方法,但由于成本低和性能高,机器学习方法在化学亚细胞定位预测中起着关键作用。为此,本文开发了基于机器学习的方法。首先,从文献中收集了614种定位于溶酶体、线粒体、细胞核和质膜的独特化合物。80%的化合物用于构建模型,其余作为外部验证集。指纹和分子描述符都用于描述分子,并应用六种机器学习方法构建多分类模型。通过五折交叉验证和外部验证来衡量模型的性能。我们进一步检测了每种定位的关键子结构,并分析了潜在的结构-定位关系,这对分子设计和修饰非常有帮助。关键子结构也可以用作指纹的补充特征,以提高模型的性能。