一种用于医疗保健分析中检测错误标签的无监督错误检测方法。

An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics.

作者信息

Zhou Pei-Yuan, Lum Faith, Wang Tony Jiecao, Bhatti Anubhav, Parmar Surajsinh, Dan Chen, Wong Andrew K C

机构信息

Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada.

AI Engineering Team, SpassMed Inc., Toronto, ON M5H 2S6, Canada.

出版信息

Bioengineering (Basel). 2024 Jul 31;11(8):770. doi: 10.3390/bioengineering11080770.

DOI:10.3390/bioengineering11080770

PMID:39199728

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11351123/

Abstract

Medical datasets may be imbalanced and contain errors due to subjective test results and clinical variability. The poor quality of original data affects classification accuracy and reliability. Hence, detecting abnormal samples in the dataset can help clinicians make better decisions. In this study, we propose an unsupervised error detection method using patterns discovered by the Pattern Discovery and Disentanglement (PDD) model, developed in our earlier work. Applied to the large data, the eICU Collaborative Research Database for sepsis risk assessment, the proposed algorithm can effectively discover statistically significant association patterns, generate an interpretable knowledge base for interpretability, cluster samples in an unsupervised learning manner, and detect abnormal samples from the dataset. As shown in the experimental result, our method outperformed K-Means by 38% on the full dataset and 47% on the reduced dataset for unsupervised clustering. Multiple supervised classifiers improve accuracy by an average of 4% after removing abnormal samples by the proposed error detection approach. Therefore, the proposed algorithm provides a robust and practical solution for unsupervised clustering and error detection in healthcare data.

摘要

医学数据集可能存在不平衡问题，并且由于主观测试结果和临床变异性而包含错误。原始数据质量差会影响分类的准确性和可靠性。因此，检测数据集中的异常样本有助于临床医生做出更好的决策。在本研究中，我们提出了一种无监督错误检测方法，该方法使用我们早期工作中开发的模式发现与解缠（PDD）模型发现的模式。应用于大数据——用于脓毒症风险评估的eICU协作研究数据库，所提出的算法可以有效地发现具有统计学意义的关联模式，生成可解释的知识库以实现可解释性，以无监督学习的方式对样本进行聚类，并从数据集中检测异常样本。实验结果表明，在无监督聚类方面，我们的方法在完整数据集上比K均值算法性能提升了38%，在精简数据集上提升了47%。通过所提出的错误检测方法去除异常样本后，多个监督分类器的准确率平均提高了4%。因此，所提出的算法为医疗保健数据中的无监督聚类和错误检测提供了一种强大而实用的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fc3/11351123/790bd2ecfda6/bioengineering-11-00770-g001.jpg

相似文献

An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics.一种用于医疗保健分析中检测错误标签的无监督错误检测方法。

Bioengineering (Basel). 2024 Jul 31;11(8):770. doi: 10.3390/bioengineering11080770.

Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement.基于模式发现与解缠的具有不平衡类别分布的临床数据的解释与预测。

BMC Med Inform Decis Mak. 2021 Jan 9;21(1):16. doi: 10.1186/s12911-020-01356-y.

The Application of the Unsupervised Migration Method Based on Deep Learning Model in the Marketing Oriented Allocation of High Level Accounting Talents.基于深度学习模型的无监督迁移方法在高级会计人才营销导向配置中的应用。

Comput Intell Neurosci. 2022 Jun 6;2022:5653942. doi: 10.1155/2022/5653942. eCollection 2022.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Theory and rationale of interpretable all-in-one pattern discovery and disentanglement system.可解释一体化模式发现与解缠系统的理论与原理

NPJ Digit Med. 2023 May 22;6(1):92. doi: 10.1038/s41746-023-00816-9.

An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples.一种基于多重过滤和监督属性聚类算法的集成机器学习模型，用于对癌症样本进行分类。

PeerJ Comput Sci. 2021 Sep 16;7:e671. doi: 10.7717/peerj-cs.671. eCollection 2021.

Pattern discovery and disentanglement on relational datasets.关系型数据集的模式发现与解缠。

Sci Rep. 2021 Mar 11;11(1):5688. doi: 10.1038/s41598-021-84869-4.

Identifying diseases symptoms and general rules using supervised and unsupervised machine learning.使用监督式和非监督式机器学习识别疾病症状和一般规则。

Sci Rep. 2024 Aug 2;14(1):17956. doi: 10.1038/s41598-024-69029-8.

Unsupervised ECG Analysis: A Review.无监督心电图分析：综述

IEEE Rev Biomed Eng. 2023;16:208-224. doi: 10.1109/RBME.2022.3154893. Epub 2023 Jan 5.

Healthcare insurance fraud detection using data mining.利用数据挖掘进行医疗保险欺诈检测。

BMC Med Inform Decis Mak. 2024 Apr 26;24(1):112. doi: 10.1186/s12911-024-02512-4.

本文引用的文献

Precision Identification of Locally Advanced Rectal Cancer in Denoised CT Scans Using EfficientNet and Voting System Algorithms.使用EfficientNet和投票系统算法在去噪CT扫描中精确识别局部晚期直肠癌

Bioengineering (Basel). 2024 Apr 19;11(4):399. doi: 10.3390/bioengineering11040399.

Leveraging a 7-Layer Long Short-Term Memory Model for Early Detection and Prevention of Diabetes in Oman: An Innovative Approach.利用7层长短期记忆模型早期检测和预防阿曼的糖尿病：一种创新方法。

Bioengineering (Basel). 2024 Apr 15;11(4):379. doi: 10.3390/bioengineering11040379.

Towards Transparent Healthcare: Advancing Local Explanation Methods in Explainable Artificial Intelligence.迈向透明医疗：推进可解释人工智能中的局部解释方法

Bioengineering (Basel). 2024 Apr 12;11(4):369. doi: 10.3390/bioengineering11040369.

Brain Tumor Detection and Categorization with Segmentation of Improved Unsupervised Clustering Approach and Machine Learning Classifier.基于改进的无监督聚类方法和机器学习分类器的分割技术进行脑肿瘤检测与分类

Bioengineering (Basel). 2024 Mar 8;11(3):266. doi: 10.3390/bioengineering11030266.

Efficient automated error detection in medical data using deep-learning and label-clustering.使用深度学习和标签聚类技术实现医学数据的高效自动化错误检测。

Sci Rep. 2023 Nov 9;13(1):19587. doi: 10.1038/s41598-023-45946-y.

An Explainable Machine-Learning Model for Compensatory Reserve Measurement: Methods for Feature Selection and the Effects of Subject Variability.一种用于代偿储备测量的可解释机器学习模型：特征选择方法及个体差异的影响

Bioengineering (Basel). 2023 May 19;10(5):612. doi: 10.3390/bioengineering10050612.

Theory and rationale of interpretable all-in-one pattern discovery and disentanglement system.可解释一体化模式发现与解缠系统的理论与原理

NPJ Digit Med. 2023 May 22;6(1):92. doi: 10.1038/s41746-023-00816-9.

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型，转而使用可解释模型。

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

Pattern discovery and disentanglement on relational datasets.关系型数据集的模式发现与解缠。

Sci Rep. 2021 Mar 11;11(1):5688. doi: 10.1038/s41598-021-84869-4.

Explainability for artificial intelligence in healthcare: a multidisciplinary perspective.人工智能在医疗保健中的可解释性：多学科视角。

BMC Med Inform Decis Mak. 2020 Nov 30;20(1):310. doi: 10.1186/s12911-020-01332-6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于医疗保健分析中检测错误标签的无监督错误检测方法。

An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献