Varin Thibault, Bureau Ronan, Mueller Christoph, Willett Peter
Centre d'Etudes et de Recherche sur le Médicament de Normandie, UPRES EA4258, INC3M FR CNRS 3038, Université de Caen, Boulevard Becquerel, 14032 Caen Cedex, France.
J Mol Graph Model. 2009 Sep;28(2):187-95. doi: 10.1016/j.jmgm.2009.06.006. Epub 2009 Jul 4.
Ward's method is extensively used for clustering chemical structures represented by 2D fingerprints. This paper compares Ward clusterings of 14 datasets (containing between 278 and 4332 molecules) with those obtained using the Székely-Rizzo clustering method, a generalization of Ward's method. The clusters resulting from these two methods were evaluated by the extent to which the various classifications were able to group active molecules together, using a novel criterion of clustering effectiveness. Analysis of a total of 1400 classifications (Ward and Székely-Rizzo clustering methods, 14 different datasets, 5 different fingerprints and 10 different distance coefficients) demonstrated the general superiority of the Székely-Rizzo method. The distance coefficient first described by Soergel performed extremely well in these experiments, and this was also the case when it was used in simulated virtual screening experiments.
沃德方法被广泛用于对由二维指纹表示的化学结构进行聚类。本文将14个数据集(包含278至4332个分子)的沃德聚类与使用塞凯利 - 里佐聚类方法(沃德方法的一种推广)得到的聚类进行了比较。使用一种新的聚类有效性标准,通过各种分类将活性分子聚集在一起的程度来评估这两种方法产生的聚类。对总共1400种分类(沃德和塞凯利 - 里佐聚类方法、14个不同的数据集、5种不同的指纹和10个不同的距离系数)的分析表明塞凯利 - 里佐方法总体上更具优势。索尔格尔首次描述的距离系数在这些实验中表现极其出色,在模拟虚拟筛选实验中使用时也是如此。