应用基于密度的异常值识别方法，结合多个数据集对脑卒中临床结局进行验证。

Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes.

机构信息

Center for Information Technology, National Institutes of Health, Bethesda, MD, United States.

Bioinformatics Section, National Institute of Neurological Disorder and Stroke, National Institutes of Health, Bethesda, MD, United States; Department of Neurology, National Taiwan University Hospital, Taipei, Taiwan.

出版信息

Int J Med Inform. 2019 Dec;132:103988. doi: 10.1016/j.ijmedinf.2019.103988. Epub 2019 Oct 3.

DOI:10.1016/j.ijmedinf.2019.103988

PMID:31590140

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6880867/

Abstract

INTRODUCTION

Clinicians commonly use the modified Rankin Scale (mRS) and the Barthel Index (BI) to measure clinical outcome after stroke. These are potential targets in machine learning models for stroke outcome prediction. Therefore, the quality of the measurements is crucial for training and validation of these models. The objective of this study was to apply and evaluate density-based outlier detection methods for identifying potentially incorrect measurements in multiple large stroke datasets to assess the measurement quality.

METHOD

We applied three density-based outlier detection methods including density-based spatial clustering of applications (DBSCAN), hierarchical DBSCAN (HDBSCAN) and local outlier factor (LOF) based on a large dataset obtained from a nationwide prospective stroke registry in Taiwan. The testing of each method was done by using four different NINDS funded stroke datasets.

RESULT

The DBSCAN achieved a high performance across all mRS values where the highest average accuracy was 99.2 ± 0.7 at mRS of 4 and the lowest average accuracy was 92.0 ± 4.6 at mRS of 3. The LOF also achieved similar performance, however, the HDBSCAN with default parameters setting required further tuning improvement.

CONCLUSION

The density-based outlier detection methods were proven to be promising for validation of stroke outcome measures. The outlier detection algorithm developed from a large prospective registry dataset was effectively applied in four different NINDS stroke datasets with high performance results. The tool developed from this detection algorithm can be further applied to real world datasets to increase the data quality in stroke outcome measures.

摘要

简介

临床医生通常使用改良的 Rankin 量表（mRS）和巴氏指数（BI）来衡量中风后的临床结果。这些都是中风预后预测机器学习模型中的潜在目标。因此，测量的质量对于这些模型的训练和验证至关重要。本研究的目的是应用和评估基于密度的异常值检测方法，以识别多个大型中风数据集的潜在错误测量，从而评估测量质量。

方法

我们应用了三种基于密度的异常值检测方法，包括基于密度的应用空间聚类（DBSCAN）、层次 DBSCAN（HDBSCAN）和局部离群因子（LOF），这些方法基于从台湾全国前瞻性中风登记处获得的一个大型数据集。每种方法的测试都是通过使用四个不同的 NINDS 资助的中风数据集完成的。

结果

DBSCAN 在所有 mRS 值上都表现出了很高的性能，其中 mRS 为 4 时的最高平均准确率为 99.2±0.7，mRS 为 3 时的最低平均准确率为 92.0±4.6。LOF 也表现出了类似的性能，然而，HDBSCAN 在默认参数设置下需要进一步的调整改进。

结论

基于密度的异常值检测方法已被证明是验证中风预后测量的一种很有前途的方法。从大型前瞻性登记处数据集开发的异常值检测算法有效地应用于四个不同的 NINDS 中风数据集，结果性能较高。从该检测算法开发的工具可以进一步应用于真实世界的数据集，以提高中风预后测量中的数据质量。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

应用基于密度的异常值识别方法，结合多个数据集对脑卒中临床结局进行验证。

Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes.

机构信息

出版信息

INTRODUCTION

METHOD

RESULT

CONCLUSION

简介

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

应用基于密度的异常值识别方法，结合多个数据集对脑卒中临床结局进行验证。

Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes.

机构信息

出版信息

INTRODUCTION

METHOD

RESULT

CONCLUSION

简介

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献