Wani Aasim Ayaz
School of Engineering, Cornell University, Ithaca, New York, United States.
PeerJ Comput Sci. 2024 Aug 29;10:e2286. doi: 10.7717/peerj-cs.2286. eCollection 2024.
This survey rigorously explores contemporary clustering algorithms within the machine learning paradigm, focusing on five primary methodologies: centroid-based, hierarchical, density-based, distribution-based, and graph-based clustering. Through the lens of recent innovations such as deep embedded clustering and spectral clustering, we analyze the strengths, limitations, and the breadth of application domains-ranging from bioinformatics to social network analysis. Notably, the survey introduces novel contributions by integrating clustering techniques with dimensionality reduction and proposing advanced ensemble methods to enhance stability and accuracy across varied data structures. This work uniquely synthesizes the latest advancements and offers new perspectives on overcoming traditional challenges like scalability and noise sensitivity, thus providing a comprehensive roadmap for future research and practical applications in data-intensive environments.
本调查严格探讨了机器学习范式中的当代聚类算法,重点关注五种主要方法:基于质心的、层次化的、基于密度的、基于分布的和基于图的聚类。通过深度嵌入聚类和谱聚类等近期创新的视角,我们分析了这些算法的优势、局限性以及从生物信息学到社交网络分析等广泛应用领域。值得注意的是,该调查通过将聚类技术与降维相结合,并提出先进的集成方法以提高在各种数据结构上的稳定性和准确性,从而引入了新的贡献。这项工作独特地综合了最新进展,并为克服诸如可扩展性和噪声敏感性等传统挑战提供了新视角,从而为数据密集型环境中的未来研究和实际应用提供了全面的路线图。