Kireeva Natalia V, Ovchinnikova Svetlana I, Kuznetsov Sergey L, Kazennov Andrey M, Tsivadze Aslan Yu
Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky Prospect, 31a, 119071, Moscow, Russia,
J Comput Aided Mol Des. 2014 Feb;28(2):61-73. doi: 10.1007/s10822-014-9719-1. Epub 2014 Feb 4.
This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
本研究关注大间隔最近邻分类器及其多度量扩展,将其作为度量学习的有效方法,旨在为所考虑的案例研究学习合适的距离/相似性函数。近年来,数据挖掘和模式识别领域的许多研究表明,学习到的度量可以显著提高分类、聚类和检索任务的性能。本文描述了度量学习方法在化学性质的计算机模拟评估中的应用。化学性质,如不良反应和毒性,在药物发现过程中起着重要作用,化学性质的计算机模拟评估是通过补充或替代体外和体内实验来降低成本和动物试验的重要步骤。在此,据我们所知,基于距离的度量学习程序首次应用于化学性质的计算机模拟评估,分析了度量学习对构效关系图谱和所开发模型预测性能的影响,并将学习到的度量用于支持向量机。使用线性和非线性数据可视化技术展示了度量学习结果,以表明度量的变化如何影响最近邻关系和描述符空间。