Suppr超能文献

检验聚类不稳定的流氓分类群假说。

Testing the rogue taxa hypothesis for clustering instability.

机构信息

Bioinformatics Program and the University of Guelph, Canada.

Department of Mathematics and Statistics, University of Guelph, Canada.

出版信息

J Theor Biol. 2019 Jul 7;472:36-45. doi: 10.1016/j.jtbi.2019.04.002. Epub 2019 Apr 4.

Abstract

There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of "rogue taxa", i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.

摘要

长期以来,人们一直对层次聚类的稳定性存在担忧。一种被认为是导致这种不稳定性的解释是存在“流氓分类单元”,即从数据集中移除这些分类单元显然可以恢复稳定性。在这项研究中,通过将大型数据集划分为许多较小的数据集,并检查是否存在“流氓行为”,来检验流氓分类单元假说。使用标准的层次聚类算法和一种新设计的算法来检查是否存在“流氓行为”,这种新算法旨在具有更高的稳定性。研究结果表明,不能合理地说存在“流氓分类单元”,因为一个分类单元是否为“流氓分类单元”取决于该分类单元所嵌入的数据分区。除了所使用的数据选择之外,算法和算法参数的选择也会对分类单元表现出“流氓行为”的程度产生很大的影响。层次聚类的不稳定性可能会因存在问题的数据点而增加,但是数据点是否存在问题取决于它们在数据局部几何中的位置,而不是它们的生物学背景。这项研究的结果强烈表明,传统层次聚类程序中的不稳定性主要是算法设计的问题。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验