Suppr超能文献

凸聚类:层次聚类的一种有吸引力的替代方法。

Convex clustering: an attractive alternative to hierarchical clustering.

作者信息

Chen Gary K, Chi Eric C, Ranola John Michael O, Lange Kenneth

机构信息

Department of Preventive Medicine ,Biostatistics Division, University of Southern California, Los Angeles, California, United States of America.

Department of Electrical and Computer Engineering, Rice University, Houston, Texas, United States of America.

出版信息

PLoS Comput Biol. 2015 May 12;11(5):e1004228. doi: 10.1371/journal.pcbi.1004228. eCollection 2015 May.

Abstract

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.

摘要

聚类分析的主要目标是发现对象的自然分组。聚类分析领域充斥着各种不同的方法,这些方法对数据做出特殊假设并针对不同的科学目标。尽管分层聚类在准确性方面存在不足,但它仍是生物信息学中占主导地位的聚类方法。生物学家发现通过分层聚类构建的树在视觉上很有吸引力,并且与他们的进化观点相契合。分层聚类同时在多个尺度上运行。例如,在转录组数据中这一点至关重要,在转录组数据中,人们可能有兴趣对诸如基因模块等低阶关系如何导致诸如通路或生物过程等高阶关系进行定性推断。最近开发的凸聚类方法保留了分层聚类的视觉吸引力,同时改善了其在存在异常值和噪声时做出错误推断的倾向。凸聚类生成的求解路径揭示了被诸如k均值聚类等静态方法隐藏的聚类之间的关系。本文推导并测试了一种新颖的近端距离算法,用于最小化凸聚类的目标函数。该算法分离参数、处理缺失数据并支持关于关系的先验信息。我们纳入该算法的程序CONVEXCLUSTER在ATI和英伟达图形处理单元(GPU)上实现,以实现最大速度。几个生物学实例说明了凸聚类的优势以及近端距离算法处理高维问题的能力。CONVEXCLUSTER可从加州大学洛杉矶分校人类遗传学网站http://www.genetics.ucla.edu/software/免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a6a/4429070/9c13539837ee/pcbi.1004228.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验