Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia; University of Al-Qadisiyah, Al Diwaniyah, Iraq.
Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia.
Int J Med Inform. 2019 Jun;126:176-186. doi: 10.1016/j.ijmedinf.2019.03.016. Epub 2019 Mar 28.
Medical data stream clustering has become an integral part of medical decision systems since it extracts highly-sensitive information from a tremendous flow of medical data. However, clustering and maintaining of medical data streams is still a challenging task. That is because the evolving of medical data streams imposes various challenges for clustering such as the ability to discover the arbitrary shape of a cluster, the ability to group data streams without a predefined number of clusters, and the ability to maintain the data clusters dynamically.
To support the online medical decisions, there is a need to address the clustering challenges. Therefore, in this paper, we propose an effective density-based clustering and dynamic maintenance framework for grouping the patients with similar symptoms into meaningful clusters and monitoring the patients' status frequently.
For clustering, we generate a set of initial medical data clusters based on the combination of Piece-wise Aggregate Approximation and the density-based spatial clustering of applications with noise called (PAA+DBSCAN) algorithm. For maintenance, when new medical data streams arrive, we maintain the initially generated medical data clusters dynamically. Since the incremental cluster maintenance is time-consuming, we further propose an Advanced Cluster Maintenance (ACM) approach to improve the performance of the dynamic cluster maintenance.
The experimental results on real-world medical datasets demonstrate the effectiveness and efficiency of our proposed approaches. The PAA+DBSCAN algorithm is more efficient and effective than the exact DBSCAN algorithm. Moreover, the ACM approach requires less running time in comparison with the Baseline Cluster Maintenance (BCM) approach using different tuning parameter values in all datasets. That is because the BCM approach tracks all the data points in the cluster.
The proposed framework is capable of clustering and maintaining the medical data streams effectively by means of grouping the patients who share similar symptoms and tracking the patients status that naturally tends to be changing over time.
医疗数据流聚类已成为医疗决策系统不可或缺的一部分,因为它从大量的医疗数据流中提取高度敏感的信息。然而,聚类和维护医疗数据流仍然是一项具有挑战性的任务。这是因为医疗数据流的演变对聚类提出了各种挑战,例如发现任意形状的聚类的能力、在没有预定义聚类数量的情况下对数据流进行分组的能力以及动态维护数据聚类的能力。
为了支持在线医疗决策,需要解决聚类挑战。因此,在本文中,我们提出了一种有效的基于密度的聚类和动态维护框架,用于将具有相似症状的患者分组到有意义的聚类中,并频繁监测患者的状态。
对于聚类,我们基于分段聚合近似和具有噪声的应用程序的基于密度的空间聚类(称为 PAA+DBSCAN)算法的组合生成一组初始医疗数据聚类。对于维护,当新的医疗数据流到达时,我们动态维护最初生成的医疗数据聚类。由于增量聚类维护很耗时,因此我们进一步提出了一种高级聚类维护(ACM)方法来提高动态聚类维护的性能。
在真实医疗数据集上的实验结果表明了我们提出的方法的有效性和效率。PAA+DBSCAN 算法比精确的 DBSCAN 算法更高效和有效。此外,与使用不同调整参数值的基线聚类维护(BCM)方法相比,ACM 方法在所有数据集上都需要更少的运行时间。这是因为 BCM 方法跟踪聚类中的所有数据点。
所提出的框架通过对具有相似症状的患者进行分组并跟踪患者的状态(随着时间的推移自然会发生变化),能够有效地聚类和维护医疗数据流。