Suppr超能文献

应用基于密度的异常值识别方法,结合多个数据集对脑卒中临床结局进行验证。

Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes.

机构信息

Center for Information Technology, National Institutes of Health, Bethesda, MD, United States.

Bioinformatics Section, National Institute of Neurological Disorder and Stroke, National Institutes of Health, Bethesda, MD, United States; Department of Neurology, National Taiwan University Hospital, Taipei, Taiwan.

出版信息

Int J Med Inform. 2019 Dec;132:103988. doi: 10.1016/j.ijmedinf.2019.103988. Epub 2019 Oct 3.

Abstract

INTRODUCTION

Clinicians commonly use the modified Rankin Scale (mRS) and the Barthel Index (BI) to measure clinical outcome after stroke. These are potential targets in machine learning models for stroke outcome prediction. Therefore, the quality of the measurements is crucial for training and validation of these models. The objective of this study was to apply and evaluate density-based outlier detection methods for identifying potentially incorrect measurements in multiple large stroke datasets to assess the measurement quality.

METHOD

We applied three density-based outlier detection methods including density-based spatial clustering of applications (DBSCAN), hierarchical DBSCAN (HDBSCAN) and local outlier factor (LOF) based on a large dataset obtained from a nationwide prospective stroke registry in Taiwan. The testing of each method was done by using four different NINDS funded stroke datasets.

RESULT

The DBSCAN achieved a high performance across all mRS values where the highest average accuracy was 99.2 ± 0.7 at mRS of 4 and the lowest average accuracy was 92.0 ± 4.6 at mRS of 3. The LOF also achieved similar performance, however, the HDBSCAN with default parameters setting required further tuning improvement.

CONCLUSION

The density-based outlier detection methods were proven to be promising for validation of stroke outcome measures. The outlier detection algorithm developed from a large prospective registry dataset was effectively applied in four different NINDS stroke datasets with high performance results. The tool developed from this detection algorithm can be further applied to real world datasets to increase the data quality in stroke outcome measures.

摘要

简介

临床医生通常使用改良的 Rankin 量表(mRS)和巴氏指数(BI)来衡量中风后的临床结果。这些都是中风预后预测机器学习模型中的潜在目标。因此,测量的质量对于这些模型的训练和验证至关重要。本研究的目的是应用和评估基于密度的异常值检测方法,以识别多个大型中风数据集的潜在错误测量,从而评估测量质量。

方法

我们应用了三种基于密度的异常值检测方法,包括基于密度的应用空间聚类(DBSCAN)、层次 DBSCAN(HDBSCAN)和局部离群因子(LOF),这些方法基于从台湾全国前瞻性中风登记处获得的一个大型数据集。每种方法的测试都是通过使用四个不同的 NINDS 资助的中风数据集完成的。

结果

DBSCAN 在所有 mRS 值上都表现出了很高的性能,其中 mRS 为 4 时的最高平均准确率为 99.2±0.7,mRS 为 3 时的最低平均准确率为 92.0±4.6。LOF 也表现出了类似的性能,然而,HDBSCAN 在默认参数设置下需要进一步的调整改进。

结论

基于密度的异常值检测方法已被证明是验证中风预后测量的一种很有前途的方法。从大型前瞻性登记处数据集开发的异常值检测算法有效地应用于四个不同的 NINDS 中风数据集,结果性能较高。从该检测算法开发的工具可以进一步应用于真实世界的数据集,以提高中风预后测量中的数据质量。

相似文献

引用本文的文献

4
Cluster-Based Improved Isolation Forest.基于聚类的改进孤立森林
Entropy (Basel). 2022 Apr 27;24(5):611. doi: 10.3390/e24050611.
6
A Review on Computer Aided Diagnosis of Acute Brain Stroke.急性脑卒中专研综述
Sensors (Basel). 2021 Dec 20;21(24):8507. doi: 10.3390/s21248507.

本文引用的文献

8
Outlier detection for patient monitoring and alerting.患者监测和报警的异常值检测。
J Biomed Inform. 2013 Feb;46(1):47-55. doi: 10.1016/j.jbi.2012.08.004. Epub 2012 Aug 27.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验