V 聚类算法：一种基于数值数据对分子进行聚类的新算法。

V-cluster algorithm: a new algorithm for clustering molecules based upon numeric data.

作者信息

Xu Jun, Zhang Qiang, Shih Chen-Kon

机构信息

Boehringer Ingelheim Pharmaceuticals, Inc., 900 Ridgebury Road, Ridgefield, Connecticut 06877-0368, USA.

出版信息

Mol Divers. 2006 Aug;10(3):463-78. doi: 10.1007/s11030-006-9023-7. Epub 2006 Aug 1.

DOI:10.1007/s11030-006-9023-7

PMID:16896541

Abstract

Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-nearest neighbor), and other similar methods (for examples, the Self-Organization Mapping (SOM) and the Support Vector Machine (SVM)). These approaches are non-robust (results are not consistent) and, computationally expensive. This paper will report a new, non-hierarchical algorithm called the V-Cluster (V stands for vector) Algorithm. This algorithm produces rational, robust results while reducing computing complexity. Similarity measurement and data normalization rules are also discussed along with case studies. When molecules are represented in a set of numeric vectors, the V-Cluster Algorithm clusters the molecules in three steps: (1) ranking the vectors based upon their overall intensity levels, (2) computing cluster centers based upon neighboring density, and (3) assigning molecules to their nearest cluster center. The program is written in C/C++ language, and runs on Window95/NT and UNIX platforms. With the V-Cluster program, the user can quickly complete the clustering process and, easily examine the results by use of thumbnail graphs, superimposed intensity curves of vectors, and spreadsheets. Multi-functional query tools have also been implemented.

摘要

基于数值数据（如基因表达数据、物理化学性质或理论数据）对分子进行聚类，在药物发现和其他生命科学领域非常重要。大多数方法使用层次聚类算法、非层次算法（例如K均值和K近邻算法）以及其他类似方法（例如自组织映射（SOM）和支持向量机（SVM））。这些方法不稳健（结果不一致）且计算成本高昂。本文将报告一种名为V-Cluster（V代表向量）算法的新的非层次算法。该算法在降低计算复杂度的同时，能产生合理、稳健的结果。还将结合案例研究讨论相似性度量和数据归一化规则。当分子用一组数值向量表示时，V-Cluster算法分三步对分子进行聚类：（1）根据向量的整体强度水平对其进行排序，（2）根据相邻密度计算聚类中心，（3）将分子分配到其最近的聚类中心。该程序用C/C++语言编写，可在Window95/NT和UNIX平台上运行。使用V-Cluster程序，用户可以快速完成聚类过程，并通过缩略图、向量叠加强度曲线和电子表格轻松检查结果。还实现了多功能查询工具。

相似文献

V-cluster algorithm: a new algorithm for clustering molecules based upon numeric data.V 聚类算法：一种基于数值数据对分子进行聚类的新算法。

Mol Divers. 2006 Aug;10(3):463-78. doi: 10.1007/s11030-006-9023-7. Epub 2006 Aug 1.

Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类：性能与相似性分析

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

K-ary clustering with optimal leaf ordering for gene expression data.用于基因表达数据的具有最优叶排序的K元聚类

Bioinformatics. 2003 Jun 12;19(9):1070-8. doi: 10.1093/bioinformatics/btg030.

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.一种用于比较和可视化层次化与平面化基因表达数据聚类之间关系的新算法。

Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.

Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。

Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Application of Multi-SOM clustering approach to macrophage gene expression analysis.多自组织映射聚类方法在巨噬细胞基因表达分析中的应用。

Infect Genet Evol. 2009 May;9(3):328-36. doi: 10.1016/j.meegid.2008.09.009. Epub 2008 Oct 17.

Clustering binary fingerprint vectors with missing values for DNA array data analysis.用于DNA阵列数据分析的带有缺失值的二元指纹向量聚类

Proc IEEE Comput Soc Bioinform Conf. 2003;2:38-47.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类：采用TSD预聚类的FCV法

Appl Bioinformatics. 2003;2(1):35-45.

Evaluation of clustering algorithms for gene expression data.基因表达数据聚类算法的评估

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.

引用本文的文献

Predicting dual-targeting anti-influenza agents using multi-models.使用多模型预测双靶点抗流感药物

Mol Divers. 2015 Feb;19(1):123-34. doi: 10.1007/s11030-014-9552-4. Epub 2014 Oct 2.

本文引用的文献

The Literature On Cluster Analysis.《关于聚类分析的文献》

Multivariate Behav Res. 1978 Jul 1;13(3):271-95. doi: 10.1207/s15327906mbr1303_2.

A Review Of Monte Carlo Tests Of Cluster Analysis.聚类分析的蒙特卡罗检验综述

Multivariate Behav Res. 1981 Jul 1;16(3):379-407. doi: 10.1207/s15327906mbr1603_7.

PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS.基于多元混合分析的模式聚类

Multivariate Behav Res. 1970 Apr 1;5(3):329-50. doi: 10.1207/s15327906mbr0503_6.

Comparative Cluster Analysis Of Patterns Of Vocational Interest.职业兴趣模式的比较聚类分析

Multivariate Behav Res. 1978 Jan 1;13(1):33-44. doi: 10.1207/s15327906mbr1301_3.

A nonparametric algorithm for detecting clusters using hierarchical structure.一种利用层次结构检测聚类的非参数算法。

IEEE Trans Pattern Anal Mach Intell. 1980 Apr;2(4):292-300. doi: 10.1109/tpami.1980.4767028.

A Computer Program for Classifying Plants.一个用于植物分类的计算机程序。

Science. 1960 Oct 21;132(3434):1115-8. doi: 10.1126/science.132.3434.1115.

Superparamagnetic clustering of data.

Phys Rev Lett. 1996 Apr 29;76(18):3251-3254. doi: 10.1103/PhysRevLett.76.3251.

Statistical mechanics and phase transitions in clustering.聚类中的统计力学与相变

Phys Rev Lett. 1990 Aug 20;65(8):945-948. doi: 10.1103/PhysRevLett.65.945.

Cluster analysis and display of genome-wide expression patterns.全基因组表达模式的聚类分析与展示

Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8. doi: 10.1073/pnas.95.25.14863.

Practical problems in a method of cluster analysis.

Biometrics. 1971 Sep;27(3):501-14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

V 聚类算法：一种基于数值数据对分子进行聚类的新算法。

V-cluster algorithm: a new algorithm for clustering molecules based upon numeric data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献