Ordonez Carlos
Teradata (NCR), San Diego, CA 92127, USA.
IEEE Trans Inf Technol Biomed. 2006 Apr;10(2):334-43. doi: 10.1109/titb.2006.864475.
Association rules represent a promising technique to improve heart disease prediction. Unfortunately, when association rules are applied on a medical data set, they produce an extremely large number of rules. Most of such rules are medically irrelevant and the time required to find them can be impractical. A more important issue is that, in general, association rules are mined on the entire data set without validation on an independent sample. To solve these limitations, we introduce an algorithm that uses search constraints to reduce the number of rules, searches for association rules on a training set, and finally validates them on an independent test set. The medical significance of discovered rules is evaluated with support, confidence, and lift. Association rules are applied on a real data set containing medical records of patients with heart disease. In medical terms, association rules relate heart perfusion measurements and risk factors to the degree of disease in four specific arteries. Search constraints and test set validation significantly reduce the number of association rules and produce a set of rules with high predictive accuracy. We exhibit important rules with high confidence, high lift, or both, that remain valid on the test set on several runs. These rules represent valuable medical knowledge.
关联规则是一种很有前景的用于改善心脏病预测的技术。不幸的是,当将关联规则应用于医学数据集时,会产生大量的规则。其中大多数规则与医学无关,而且找到这些规则所需的时间可能不切实际。一个更重要的问题是,一般来说,关联规则是在整个数据集上挖掘的,而没有在独立样本上进行验证。为了解决这些局限性,我们引入了一种算法,该算法使用搜索约束来减少规则数量,在训练集上搜索关联规则,最后在独立测试集上进行验证。通过支持度、置信度和提升度来评估发现规则的医学意义。关联规则应用于一个包含心脏病患者医疗记录的真实数据集。从医学角度来看,关联规则将心脏灌注测量和风险因素与四条特定动脉的疾病程度联系起来。搜索约束和测试集验证显著减少了关联规则的数量,并产生了一组具有高预测准确性的规则。我们展示了在多次运行中在测试集上仍然有效的具有高置信度、高提升度或两者兼有的重要规则。这些规则代表了有价值的医学知识。