AIKE research team (INTICO), Facultad de Informatica, University of Murcia, Campus de Espinardo, Murcia, 30100, Spain.
Murcian Bio-Health Institute (IMIB-Arrixaca), Murcia, Spain.
BMC Med Inform Decis Mak. 2024 Jun 13;24(1):165. doi: 10.1186/s12911-024-02566-4.
Pattern mining techniques are helpful tools when extracting new knowledge in real practice, but the overwhelming number of patterns is still a limiting factor in the health-care domain. Current efforts concerning the definition of measures of interest for patterns are focused on reducing the number of patterns and quantifying their relevance (utility/usefulness). However, although the temporal dimension plays a key role in medical records, few efforts have been made to extract temporal knowledge about the patient's evolution from multivariate sequential patterns.
In this paper, we propose a method to extract a new type of patterns in the clinical domain called Jumping Diagnostic Odds Ratio Sequential Patterns (JDORSP). The aim of this method is to employ the odds ratio to identify a concise set of sequential patterns that represent a patient's state with a statistically significant protection factor (i.e., a pattern associated with patients that survive) and those extensions whose evolution suddenly changes the patient's clinical state, thus making the sequential patterns a statistically significant risk factor (i.e., a pattern associated with patients that do not survive), or vice versa.
The results of our experiments highlight that our method reduces the number of sequential patterns obtained with state-of-the-art pattern reduction methods by over 95%. Only by achieving this drastic reduction can medical experts carry out a comprehensive clinical evaluation of the patterns that might be considered medical knowledge regarding the temporal evolution of the patients. We have evaluated the surprisingness and relevance of the sequential patterns with clinicians, and the most interesting fact is the high surprisingness of the extensions of the patterns that become a protection factor, that is, the patients that recover after several days of being at high risk of dying.
Our proposed method with which to extract JDORSP generates a set of interpretable multivariate sequential patterns with new knowledge regarding the temporal evolution of the patients. The number of patterns is greatly reduced when compared to those generated by other methods and measures of interest. An additional advantage of this method is that it does not require any parameters or thresholds, and that the reduced number of patterns allows a manual evaluation.
模式挖掘技术在从实际实践中提取新知识时是有用的工具,但模式的数量过多仍然是医疗保健领域的一个限制因素。目前,关于定义感兴趣的模式的度量标准的工作重点是减少模式的数量并量化其相关性(效用/有用性)。然而,尽管时间维度在医疗记录中起着关键作用,但很少有人努力从多变量序列模式中提取有关患者演变的时间知识。
在本文中,我们提出了一种在临床领域中提取称为跳跃诊断优势比序列模式(JDORSP)的新模式的方法。该方法的目的是利用优势比识别一组简洁的序列模式,这些模式代表患者的状态具有统计学上显著的保护因子(即与存活患者相关的模式)和那些扩展,这些扩展突然改变了患者的临床状态,从而使序列模式成为统计学上显著的风险因子(即与未存活患者相关的模式),或者反之亦然。
我们的实验结果表明,我们的方法将最先进的模式减少方法获得的序列模式数量减少了 95%以上。只有通过实现这种急剧减少,医学专家才能对可能被认为是有关患者时间演变的医学知识的模式进行全面的临床评估。我们已经与临床医生一起评估了序列模式的惊人程度和相关性,最有趣的事实是,模式扩展成为保护因子的惊人程度很高,即那些在几天内处于高死亡风险后恢复的患者。
我们提出的提取 JDORSP 的方法生成了一组具有有关患者时间演变的新知识的可解释的多变量序列模式。与其他方法相比,模式的数量大大减少,并且感兴趣的度量标准也减少了。该方法的另一个优点是它不需要任何参数或阈值,并且减少的模式数量允许进行手动评估。