在临床领域使用诊断比值比选择模式以构建可解释的基于模式的分类器：多变量序列模式挖掘研究

Using the Diagnostic Odds Ratio to Select Patterns to Build an Interpretable Pattern-Based Classifier in a Clinical Domain: Multivariate Sequential Pattern Mining Study.

作者信息

Casanova Isidoro J, Campos Manuel, Juarez Jose M, Gomariz Antonio, Lorente-Ros Marta, Lorente Jose A

机构信息

AIKE Research Team (INTICO), Computer Science Faculty, University of Murcia, Murcia, Spain.

Murcian Bio-Health Institute (IMIB-Arrixaca), Murcia, Spain.

出版信息

JMIR Med Inform. 2022 Aug 10;10(8):e32319. doi: 10.2196/32319.

DOI:10.2196/32319

PMID:35947437

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9403826/

Abstract

BACKGROUND

It is important to exploit all available data on patients in settings such as intensive care burn units (ICBUs), where several variables are recorded over time. It is possible to take advantage of the multivariate patterns that model the evolution of patients to predict their survival. However, pattern discovery algorithms generate a large number of patterns, of which only some are relevant for classification.

OBJECTIVE

We propose to use the diagnostic odds ratio (DOR) to select multivariate sequential patterns used in the classification in a clinical domain, rather than employing frequency properties.

METHODS

We used data obtained from the ICBU at the University Hospital of Getafe, where 6 temporal variables for 465 patients were registered every day during 5 days, and to model the evolution of these clinical variables, we used multivariate sequential patterns by applying 2 different discretization methods for the continuous attributes. We compared 4 ways in which to employ the DOR for pattern selection: (1) we used it as a threshold to select patterns with a minimum DOR; (2) we selected patterns whose differential DORs are higher than a threshold with regard to their extensions; (3) we selected patterns whose DOR CIs do not overlap; and (4) we proposed the combination of threshold and nonoverlapping CIs to select the most discriminative patterns. As a baseline, we compared our proposals with Jumping Emerging Patterns, one of the most frequently used techniques for pattern selection that utilizes frequency properties.

RESULTS

We have compared the number and length of the patterns eventually selected, classification performance, and pattern and model interpretability. We show that discretization has a great impact on the accuracy of the classification model, but that a trade-off must be found between classification accuracy and the physicians' capacity to interpret the patterns obtained. We have also identified that the experiments combining threshold and nonoverlapping CIs (Option 4) obtained the fewest number of patterns but also with the smallest size, thus implying the loss of an acceptable accuracy with regard to clinician interpretation. The best classification model according to the trade-off is a JRIP classifier with only 5 patterns (20 items) that was built using unsupervised correlation preserving discretization and differential DOR in a beam search for the best pattern. It achieves a specificity of 56.32% and an area under the receiver operating characteristic curve of 0.767.

CONCLUSIONS

A method for the classification of patients' survival can benefit from the use of sequential patterns, as these patterns consider knowledge about the temporal evolution of the variables in the case of ICBU. We have proved that the DOR can be used in several ways, and that it is a suitable measure to select discriminative and interpretable quality patterns.

摘要

背景

在重症监护烧伤病房（ICBU）等环境中，利用患者的所有可用数据非常重要，在这些环境中会随时间记录多个变量。利用对患者病情演变进行建模的多变量模式来预测他们的生存情况是可行的。然而，模式发现算法会生成大量模式，其中只有一些与分类相关。

目的

我们建议使用诊断比值比（DOR）来选择临床领域分类中使用的多变量序列模式，而不是采用频率属性。

方法

我们使用了从赫塔费大学医院的重症监护烧伤病房获得的数据，在5天时间里，每天记录465名患者的6个时间变量，为了对这些临床变量的演变进行建模，我们通过对连续属性应用2种不同的离散化方法来使用多变量序列模式。我们比较了使用DOR进行模式选择的4种方法：（1）将其用作阈值来选择具有最小DOR的模式；（2）选择其扩展的差异DOR高于阈值的模式；（3）选择其DOR置信区间不重叠的模式；（4）我们提出将阈值和非重叠置信区间相结合来选择最具判别力的模式。作为基线，我们将我们的提议与跳跃新兴模式进行了比较，跳跃新兴模式是最常用的利用频率属性进行模式选择的技术之一。

结果

我们比较了最终选择的模式的数量和长度、分类性能以及模式和模型的可解释性。我们表明离散化对分类模型的准确性有很大影响，但必须在分类准确性和医生解释所获得模式的能力之间找到平衡。我们还确定，结合阈值和非重叠置信区间的实验（选项4）获得的模式数量最少，但规模也最小，因此就临床医生的解释而言意味着损失了可接受的准确性。根据这种平衡，最佳分类模型是一个JRIP分类器，它仅使用5个模式（20个项目）构建，该分类器使用无监督相关保持离散化和在波束搜索最佳模式时的差异DOR。它实现了56.32%的特异性和0.767的受试者工作特征曲线下面积。

结论

一种用于患者生存分类的方法可以从使用序列模式中受益，因为这些模式考虑了重症监护烧伤病房案例中变量的时间演变知识。我们已经证明DOR可以以多种方式使用，并且它是选择有判别力且可解释的优质模式的合适度量。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在临床领域使用诊断比值比选择模式以构建可解释的基于模式的分类器：多变量序列模式挖掘研究

Using the Diagnostic Odds Ratio to Select Patterns to Build an Interpretable Pattern-Based Classifier in a Clinical Domain: Multivariate Sequential Pattern Mining Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

在临床领域使用诊断比值比选择模式以构建可解释的基于模式的分类器：多变量序列模式挖掘研究

Using the Diagnostic Odds Ratio to Select Patterns to Build an Interpretable Pattern-Based Classifier in a Clinical Domain: Multivariate Sequential Pattern Mining Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献