Wang Lin, Jiang Minghu, Lu Yinghua, Sun Minfu, Noe Frank
Biomedical Center, School of Electronics Eng., Beijing Univ. of Posts and Telecom., Beijing, 100876, China.
Int J Neural Syst. 2007 Dec;17(6):447-58. doi: 10.1142/S0129065707001287.
The research aim is to use three clustering technologies for establishing molecular data model of large size sets by comparison between low energy samples (LES) and local molecular samples (LMS). Hierarchical cluster of multi-level tree distance relation, competitive learning network of similar inputs falling into the same cluster and topological SOM are used to analyze 6,242 LES and 5,000 LMS. Our experiments show that in SOM, there are 24 to 25 Davies-Boulding clustering index and color map cluster units in the LES more than 10 to 12 in the LMS, which is consistent with the results of hierarchical cluster and competitive learning network in the rough. The hierarchical cluster reflects the biggest inter-cluster distance about 30 for the LES is far larger than that of LMS about 10. The intra-cluster distance of LES about 15 is also far bigger than that of LMS about 3. In SOM, there are more cluster borders of high values (black) reflecting large distance and more clusters in the D-matrix and U-matrix of LES than that of LMS, due to the biggest standard deviation range from -8 to 10 of samples feature of the LES is bigger than that of LMS from -2.5 to 2.5.
本研究旨在通过比较低能量样本(LES)和局部分子样本(LMS),运用三种聚类技术建立大尺寸数据集的分子数据模型。采用多级树距离关系的层次聚类、相似输入落入同一聚类的竞争学习网络以及拓扑自组织映射(SOM)来分析6242个LES和5000个LMS。我们的实验表明,在SOM中,LES的戴维斯-布尔丁聚类指数有24至25个,彩色映射聚类单元超过10至12个,而LMS中则为10至12个,这与层次聚类和竞争学习网络的大致结果一致。层次聚类显示,LES的最大类间距离约为30,远大于LMS的约10。LES的类内距离约为15,也远大于LMS的约3。在SOM中,由于LES样本特征的最大标准差范围为-8至10,大于LMS的-2.5至2.5,因此LES的D矩阵和U矩阵中反映大距离的高值(黑色)聚类边界更多,聚类也更多。