Kwiatkowska Mila, Atkins M Stella, Ayas Najib T, Ryan C Frank
Computing Science Department, Thompson Rivers University, Kamloops, BC V2C 5N3, Canada.
IEEE Trans Inf Technol Biomed. 2007 Nov;11(6):651-60. doi: 10.1109/titb.2006.889693.
Clinical prediction rules play an important role in medical practice. They expedite diagnosis and limit unnecessary tests. However, the rule creation process is time consuming and expensive. With the current developments of efficient data mining algorithms and growing accessibility to medical data, the creation of clinical rules can be supported by automated rule induction from data. A data-driven method based on the reuse of previously collected medical records and clinical trial statistics is cost-effective; however, it requires well defined and intelligent methods for data analysis. This paper presents a new framework for knowledge representation for secondary data analysis and for generation of a new typicality measure, which integrates medical knowledge into statistical analysis. The framework is based on a semiotic approach for contextual knowledge and fuzzy logic for approximate knowledge. This semio-fuzzy framework has been applied to the analysis of predictors for the diagnosis of obstructive sleep apnea. This approach was tested on two clinical data sets. Medical knowledge was represented by a set of facts and fuzzy rules, and used to perform statistical analysis. Statistical methods provided several candidate outliers. Our new typicality measure identified those, which were medically significant, in the sense that the removal of those important outliers improved the descriptive model. This is a critical preprocessing step towards automated induction of predictive rules from data. These experimental results demonstrate that knowledge-based methods integrated with statistical approaches provide a practical framework to support the generation of clinical prediction rules.
临床预测规则在医学实践中发挥着重要作用。它们加快了诊断速度并限制了不必要的检查。然而,规则创建过程既耗时又昂贵。随着高效数据挖掘算法的当前发展以及医学数据获取的日益便捷,临床规则的创建可以通过从数据中自动归纳规则来提供支持。一种基于重复使用先前收集的医疗记录和临床试验统计数据的数据驱动方法具有成本效益;然而,它需要定义明确且智能的数据分析方法。本文提出了一个用于二次数据分析的知识表示以及生成新的典型性度量的新框架,该框架将医学知识整合到统计分析中。该框架基于用于上下文知识的符号学方法和用于近似知识的模糊逻辑。这种半模糊框架已应用于阻塞性睡眠呼吸暂停诊断预测指标的分析。该方法在两个临床数据集上进行了测试。医学知识由一组事实和模糊规则表示,并用于进行统计分析。统计方法提供了几个候选异常值。我们新的典型性度量识别出了那些在医学上具有重要意义的异常值,即去除那些重要异常值会改善描述模型。这是朝着从数据中自动归纳预测规则的关键预处理步骤。这些实验结果表明,与统计方法相结合的基于知识的方法提供了一个支持临床预测规则生成的实用框架。