Natural Product Informatics Research Center, KIST Gangneung Institute of Natural Products, Gangneung 25451, Republic of Korea.
Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Republic of Korea.
Nucleic Acids Res. 2019 Nov 18;47(20):e128. doi: 10.1093/nar/gkz743.
Chemical similarity searching is a basic research tool that can be used to find small molecules which are similar in shape to known active molecules. Despite its popularity, the retrieval of local molecular features that are critical to functional activity related to target binding often fails. To overcome this limitation, we developed a novel machine learning-based chemical binding similarity score by using various evolutionary relationships of binding targets. The chemical similarity was defined by the probability of chemical compounds binding to identical targets. Comprehensive and heterogeneous multiple target-binding chemical data were integrated into a paired data format and processed using multiple classification similarity-learning models with various levels of target evolutionary information. Encoding evolutionary information to chemical compounds through their binding targets substantially expanded available chemical-target interaction data and significantly improved model performance. The output probability of our integrated model, referred to as ensemble evolutionary chemical binding similarity (ensECBS), was effective for finding hidden chemical relationships. The developed method can serve as a novel chemical similarity tool that uses evolutionarily conserved target binding information.
化学相似性搜索是一种基本的研究工具,可用于寻找与已知活性分子形状相似的小分子。尽管它很受欢迎,但经常无法检索到与靶标结合相关的功能活性关键的局部分子特征。为了克服这一限制,我们利用结合靶标的各种进化关系,开发了一种基于新型机器学习的化学结合相似性评分。化学相似性通过化合物与相同靶标结合的概率来定义。综合且异构的多种靶标结合化学数据被整合到配对数据格式中,并使用具有不同靶标进化信息水平的多种分类相似性学习模型进行处理。通过结合靶标对化合物进行编码进化信息,极大地扩展了可用的化学-靶标相互作用数据,并显著提高了模型性能。我们的集成模型的输出概率,称为集成进化化学结合相似性(ensECBS),对于发现隐藏的化学关系非常有效。所开发的方法可以作为一种新的化学相似性工具,利用进化上保守的靶标结合信息。