文献检索，用中文搜 PubMed

BACKGROUND

Anomaly detection is crucial in healthcare data due to challenges associated with the integration of smart technologies and healthcare. Anomaly in electronic health record can be associated with an insider trying to access and manipulate the data. This article focuses around the anomalies under different contexts.

METHODOLOGY

This research has proposed methodology to secure Electronic Health Records (EHRs) within a complex environment. We have employed a systematic approach encompassing data preprocessing, labeling, modeling, and evaluation. Anomalies are not labelled thus a mechanism is required that predicts them with greater accuracy and less false positive results. This research utilized unsupervised machine learning algorithms that includes Isolation Forest and Local Outlier Factor clustering algorithms. By calculating anomaly scores and validating clustering through metrics like the Silhouette Score and Dunn Score, we enhanced the capacity to secure sensitive healthcare data evolving digital threats. Three variations of Isolation Forest (IForest)models (SVM, Decision Tree, and Random Forest) and three variations of Local Outlier Factor (LOF) models (SVM, Decision Tree, and Random Forest) are evaluated based on accuracy, sensitivity, specificity, and F1 Score.

RESULTS

Isolation Forest SVM achieves the highest accuracy of 99.21%, high sensitivity (99.75%) and specificity (99.32%), and a commendable F1 Score of 98.72%. The Isolation Forest Decision Tree also performs well with an accuracy of 98.92% and an F1 Score of 99.35%. However, the Isolation Forest Random Forest exhibits lower specificity (72.84%) than the other models.

CONCLUSION

The experimental results reveal that Isolation Forest SVM emerges as the top performer showcasing the effectiveness of these models in anomaly detection tasks. The proposed methodology utilizing isolation forest and SVM produced better results by detecting anomalies with less false positives in this specific EHR of a hospital in North England. Furthermore the proposal is also able to identify new contextual anomalies that were not identified in the baseline methodology.

BACKGROUND

METHODOLOGY

RESULTS

CONCLUSION

背景

由于与智能技术和医疗保健集成相关的挑战，异常检测在医疗保健数据中至关重要。电子健康记录中的异常可能与试图访问和操纵数据的内部人员有关。本文重点介绍了不同上下文中的异常情况。

方法

本研究提出了一种在复杂环境中保护电子健康记录 (EHR) 的方法。我们采用了一种系统的方法，包括数据预处理、标记、建模和评估。异常没有标记，因此需要一种机制来更准确地预测它们，并减少假阳性结果。本研究利用了包括隔离森林和局部离群因子聚类算法在内的无监督机器学习算法。通过计算异常分数并通过轮廓分数和 Dunn 分数等指标验证聚类，我们提高了应对数字威胁下不断发展的敏感医疗保健数据的安全性。基于准确性、敏感性、特异性和 F1 分数，评估了三种隔离森林 (IForest) 模型（SVM、决策树和随机森林）和三种局部离群因子 (LOF) 模型（SVM、决策树和随机森林）的变体。