Suppr超能文献

V 聚类算法:一种基于数值数据对分子进行聚类的新算法。

V-cluster algorithm: a new algorithm for clustering molecules based upon numeric data.

作者信息

Xu Jun, Zhang Qiang, Shih Chen-Kon

机构信息

Boehringer Ingelheim Pharmaceuticals, Inc., 900 Ridgebury Road, Ridgefield, Connecticut 06877-0368, USA.

出版信息

Mol Divers. 2006 Aug;10(3):463-78. doi: 10.1007/s11030-006-9023-7. Epub 2006 Aug 1.

Abstract

Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-nearest neighbor), and other similar methods (for examples, the Self-Organization Mapping (SOM) and the Support Vector Machine (SVM)). These approaches are non-robust (results are not consistent) and, computationally expensive. This paper will report a new, non-hierarchical algorithm called the V-Cluster (V stands for vector) Algorithm. This algorithm produces rational, robust results while reducing computing complexity. Similarity measurement and data normalization rules are also discussed along with case studies. When molecules are represented in a set of numeric vectors, the V-Cluster Algorithm clusters the molecules in three steps: (1) ranking the vectors based upon their overall intensity levels, (2) computing cluster centers based upon neighboring density, and (3) assigning molecules to their nearest cluster center. The program is written in C/C++ language, and runs on Window95/NT and UNIX platforms. With the V-Cluster program, the user can quickly complete the clustering process and, easily examine the results by use of thumbnail graphs, superimposed intensity curves of vectors, and spreadsheets. Multi-functional query tools have also been implemented.

摘要

基于数值数据(如基因表达数据、物理化学性质或理论数据)对分子进行聚类,在药物发现和其他生命科学领域非常重要。大多数方法使用层次聚类算法、非层次算法(例如K均值和K近邻算法)以及其他类似方法(例如自组织映射(SOM)和支持向量机(SVM))。这些方法不稳健(结果不一致)且计算成本高昂。本文将报告一种名为V-Cluster(V代表向量)算法的新的非层次算法。该算法在降低计算复杂度的同时,能产生合理、稳健的结果。还将结合案例研究讨论相似性度量和数据归一化规则。当分子用一组数值向量表示时,V-Cluster算法分三步对分子进行聚类:(1)根据向量的整体强度水平对其进行排序,(2)根据相邻密度计算聚类中心,(3)将分子分配到其最近的聚类中心。该程序用C/C++语言编写,可在Window95/NT和UNIX平台上运行。使用V-Cluster程序,用户可以快速完成聚类过程,并通过缩略图、向量叠加强度曲线和电子表格轻松检查结果。还实现了多功能查询工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验