Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
Department of Pathology, Microbiology, and Immunology, School Of Veterinary Medicine, University of California, Davis, USA.
BMC Res Notes. 2024 May 12;17(1):133. doi: 10.1186/s13104-024-06791-y.
The choice of an appropriate similarity measure plays a pivotal role in the effectiveness of clustering algorithms. However, many conventional measures rely solely on feature values to evaluate the similarity between objects to be clustered. Furthermore, the assumption of feature independence, while valid in certain scenarios, does not hold true for all real-world problems. Hence, considering alternative similarity measures that account for inter-dependencies among features can enhance the effectiveness of clustering in various applications.
In this paper, we present the Inv measure, a novel similarity measure founded on the concept of inversion. The Inv measure considers the significance of features, the values of all object features, and the feature values of other objects, leading to a comprehensive and precise evaluation of similarity. To assess the performance of our proposed clustering approach that incorporates the Inv measure, we evaluate it on simulated data using the adjusted Rand index.
The simulation results strongly indicate that inversion-based clustering outperforms other methods in scenarios where clusters are complex, i.e., apparently highly overlapped. This showcases the practicality and effectiveness of the proposed approach, making it a valuable choice for applications that involve complex clusters across various domains.
The inversion-based clustering approach may hold significant value in the healthcare industry, offering possible benefits in tasks like hospital ranking, treatment improvement, and high-risk patient identification. In social media analysis, it may prove valuable for trend detection, sentiment analysis, and user profiling. E-commerce may be able to utilize the approach for product recommendation and customer segmentation. The manufacturing sector may benefit from improved quality control, process optimization, and predictive maintenance. Additionally, the approach may be applied to traffic management and fleet optimization in the transportation domain. Its versatility and effectiveness make it a promising solution for diverse fields, providing valuable insights and optimization opportunities for complex and dynamic data analysis tasks.
在聚类算法的有效性中,选择适当的相似度度量起着关键作用。然而,许多传统的度量方法仅依赖于特征值来评估要聚类的对象之间的相似性。此外,特征独立性的假设虽然在某些场景下是有效的,但并不适用于所有真实世界的问题。因此,考虑考虑特征之间相互依赖的替代相似性度量可以提高聚类在各种应用中的效果。
在本文中,我们提出了 Inv 度量,这是一种基于反转概念的新相似性度量。Inv 度量考虑了特征的重要性、所有对象特征的值以及其他对象的特征值,从而对相似性进行了全面而精确的评估。为了评估我们提出的聚类方法的性能,我们在模拟数据上使用调整后的兰德指数进行了评估。
模拟结果强烈表明,基于反转的聚类在聚类复杂的情况下表现优于其他方法,即明显高度重叠的情况下。这展示了所提出方法的实用性和有效性,使其成为涉及各种领域复杂聚类的应用的有价值的选择。
基于反转的聚类方法在医疗保健行业可能具有重要价值,在医院排名、治疗改进和高危患者识别等任务中可能具有潜在优势。在社交媒体分析中,它可能对趋势检测、情感分析和用户画像有用。电子商务可能能够利用该方法进行产品推荐和客户细分。制造业可能受益于改进的质量控制、过程优化和预测性维护。此外,该方法可应用于交通领域的交通管理和车队优化。其多功能性和有效性使其成为各种领域的有前途的解决方案,为复杂和动态数据分析任务提供有价值的见解和优化机会。