Suppr超能文献

通用计算机辅助聚类和概念化。

General purpose computer-assisted clustering and conceptualization.

机构信息

Department of Political Science, Stanford University, Encina Hall West, 616 Serra Street, Palo Alto, CA 94305, USA.

出版信息

Proc Natl Acad Sci U S A. 2011 Feb 15;108(7):2643-50. doi: 10.1073/pnas.1018067108. Epub 2011 Feb 3.

Abstract

We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an "insightful" or "useful" way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given dataset (along with millions of other solutions we add based on combinations of existing clusterings) and enable a user to explore and interact with it and quickly reveal or prompt useful or insightful conceptualizations. In addition, although it is uncommon to do so in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than expert human coders or many existing fully automated methods.

摘要

我们开发了一种计算机辅助方法,用于发现有见地的概念化,其形式为输入对象的聚类(即分区)。统计学、计算机科学和生物学中提出的众多完全自动化的聚类分析方法中的每一种都优化了不同的目标函数。几乎所有方法都有明确定义,但在事实之前,如何确定对于给定用户,哪些方法(如果有的话)将以“有见地”或“有用”的方式对给定的对象集进行分区是未知且困难的,如果不是逻辑上不可能的话。我们从应用于给定数据集的所有现有聚类分析方法中开发了一个分区度量空间(以及基于现有聚类组合添加的数百万个其他解决方案),并使用户能够探索和与之交互,并快速揭示或提示有用或有见地的概念化。此外,尽管在无监督学习问题中很少这样做,但我们提供并实现了评估设计,使我们的计算机辅助方法容易在特定数据类型中被证明为次优。我们证明,与专家人类编码员或许多现有的完全自动化方法相比,我们的方法更有助于高效和有见地地发现有用信息。

相似文献

1
General purpose computer-assisted clustering and conceptualization.通用计算机辅助聚类和概念化。
Proc Natl Acad Sci U S A. 2011 Feb 15;108(7):2643-50. doi: 10.1073/pnas.1018067108. Epub 2011 Feb 3.
2
Combining multiple clusterings using evidence accumulation.使用证据积累合并多个聚类。
IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.
3
Clustering ensembles: models of consensus and weak partitions.聚类集成:共识模型与弱划分
IEEE Trans Pattern Anal Mach Intell. 2005 Dec;27(12):1866-81. doi: 10.1109/TPAMI.2005.237.
8
A knowledge-driven approach to biomedical document conceptualization.基于知识的生物医学文献概念化方法。
Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.
9
Toward the optimization of normalized graph Laplacian.迈向归一化图拉普拉斯算子的优化。
IEEE Trans Neural Netw. 2011 Apr;22(4):660-6. doi: 10.1109/TNN.2011.2107919. Epub 2011 Feb 28.
10
Unsupervised active learning based on hierarchical graph-theoretic clustering.基于层次图论聚类的无监督主动学习
IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1147-61. doi: 10.1109/TSMCB.2009.2013197. Epub 2009 Mar 24.

引用本文的文献

10
Estimating the cost of new public health legislation.估算新公共卫生法规的成本。
Bull World Health Organ. 2012 Jul 1;90(7):532-9. doi: 10.2471/BLT.11.097584. Epub 2012 May 8.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验