Downs S M, Wallace M Y
Departments of Pediatrics and Biomedical Engineering, University of North Carolina at Chapel Hill, USA.
Proc AMIA Symp. 2000:200-4.
The purpose of this study was to apply an unsupervised data mining algorithm to a database containing data collected at the point of care for clinical decision support. The data set was taken from the Child Health Improvement Program (CHIP), a preventive services tracking and reminder system in use at the University of North Carolina. The database contains over 30,000 visits. We used a previously described pattern discovery algorithm to extract 2nd and 3rd order association rules from the data and reviewed the literature two see if the associations had been described before. The algorithm discovered 16 2nd order associations and 103 3rd order associations. The 3rd order associations contained no new information. The 2nd order associations demonstrated a covariance among a range of health risk behaviors. Additionally, the algorithm discovered that both tobacco smoke exposure and chronic cardiopulmonary disease are associated with failure on developmental screens. These relationships have been described before and have been attributed to underlying poverty. The work demonstrates the ability of unsupervised data mining by rule association on sparse clinical data to discover clinically important associations. However, many associations may be previously known or explained by confounding variables.
本研究的目的是将一种无监督数据挖掘算法应用于一个数据库,该数据库包含在医疗点收集的数据,用于临床决策支持。数据集取自儿童健康改善计划(CHIP),这是北卡罗来纳大学正在使用的一个预防性服务跟踪和提醒系统。该数据库包含超过30000次就诊记录。我们使用一种先前描述的模式发现算法从数据中提取二阶和三阶关联规则,并查阅文献以查看这些关联之前是否已有描述。该算法发现了16个二阶关联和103个三阶关联。三阶关联未包含新信息。二阶关联显示了一系列健康风险行为之间的协方差。此外,该算法发现,接触烟草烟雾和慢性心肺疾病均与发育筛查失败有关。这些关系之前已有描述,并且被归因于潜在的贫困。这项工作证明了通过对稀疏临床数据进行规则关联的无监督数据挖掘能够发现具有临床重要性的关联。然而,许多关联可能是之前已知的,或者可以由混杂变量来解释。