School of Computer Engineering, Weifang University, Weifang, Shandong 261061, China.
School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China.
Comput Intell Neurosci. 2017;2017:2519782. doi: 10.1155/2017/2519782. Epub 2017 Oct 11.
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
增量聚类算法在大规模数据分析和实时数据处理等各种应用中起着至关重要的作用。增量聚类的典型应用场景对硬件平台的计算能力提出了很高的要求。并行计算是满足这一需求的常用解决方案。此外,通用图形处理单元(GPGPU)是一种很有前途的并行计算设备。然而,当增量聚类算法由 GPGPU 提供支持时,它们在聚类精度和并行性之间面临着困境。我们正式分析了这种困境的原因。首先,我们形式化了与增量聚类相关的概念,如不断变化的粒度。其次,我们正式证明了两个定理。第一个定理证明了聚类精度和不断变化的粒度之间的关系。此外,该定理还分析了不同到相同错误关联的上下界。这种错误关联发生的次数越少,准确性就越高。第二个定理揭示了并行性和粒度之间的关系。较小的工作深度意味着更好的并行性。通过证明,我们得出结论,增量聚类算法的准确性与不断变化的粒度呈负相关,而并行性与粒度呈正相关。因此,这种矛盾的关系导致了困境。最后,我们通过一个演示算法验证了这些关系。实验结果验证了理论结论。