数据挖掘中的索引提升与流行病学研究中的关联度量相对风险密切相关。

The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies.

机构信息

University of Alberta School of Public Health, Edmonton, AB, Canada.

Department of Computing Science, University of Alberta, Edmonton, AB, Canada.

出版信息

BMC Med Inform Decis Mak. 2019 Jun 17;19(1):112. doi: 10.1186/s12911-019-0838-4.

DOI:10.1186/s12911-019-0838-4

PMID:31208407

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6580490/

Abstract

BACKGROUND

Data mining tools have been increasingly used in health research, with the promise of accelerating discoveries. Lift is a standard association metric in the data mining community. However, health researchers struggle with the interpretation of lift. As a result, dissemination of data mining results can be met with hesitation. The relative risk and odds ratio are standard association measures in the health domain, due to their straightforward interpretation and comparability across populations. We aimed to investigate the lift-relative risk and the lift-odds ratio relationships, and provide tools to convert lift to the relative risk and odds ratio.

METHODS

We derived equations linking lift-relative risk and lift-odds ratio. We discussed how lift, relative risk, and odds ratio behave numerically with varying association strengths and exposure prevalence levels. The lift-relative risk relationship was further illustrated using a high-dimensional dataset which examines the association of exposure to airborne pollutants and adverse birth outcomes. We conducted spatial association rule mining using the Kingfisher algorithm, which identified association rules using its built-in lift metric. We directly estimated relative risks and odds ratios from 2 by 2 tables for each identified rule. These values were compared to the corresponding lift values, and relative risks and odds ratios were computed using the derived equations.

RESULTS

As the exposure-outcome association strengthens, the odds ratio and relative risk move away from 1 faster numerically than lift, i.e. |log (odds ratio)| ≥ |log (relative risk)| ≥ |log (lift)|. In addition, lift is bounded by the smaller of the inverse probability of outcome or exposure, i.e. lift≤ min (1/P(O), 1/P(E)). Unlike the relative risk and odds ratio, lift depends on the exposure prevalence for fixed outcomes. For example, when an exposure A and a less prevalent exposure B have the same relative risk for an outcome, exposure A has a lower lift than B.

CONCLUSIONS

Lift, relative risk, and odds ratio are positively correlated and share the same null value. However, lift depends on the exposure prevalence, and thus is not straightforward to interpret or to use to compare association strength. Tools are provided to obtain the relative risk and odds ratio from lift.

摘要

背景

数据挖掘工具在健康研究中得到了越来越多的应用，有望加速发现。提升是数据挖掘领域中的一个标准关联度量。然而，健康研究人员在解释提升时遇到了困难。因此，数据挖掘结果的传播可能会犹豫不决。相对风险和优势比是健康领域的标准关联度量，因为它们的解释简单，并且在不同人群之间具有可比性。我们旨在研究提升-相对风险和提升-优势比之间的关系，并提供将提升转换为相对风险和优势比的工具。

方法

我们推导出了将提升-相对风险和提升-优势比联系起来的方程。我们讨论了在不同关联强度和暴露流行水平下，提升、相对风险和优势比在数值上的表现。我们使用一个高维数据集进一步说明了提升-相对风险关系，该数据集研究了暴露于空气污染物与不良出生结果之间的关联。我们使用 Kingfisher 算法进行空间关联规则挖掘，该算法使用其内置的提升度量来识别关联规则。我们直接从每个识别出的规则的 2x2 表中估计相对风险和优势比。将这些值与相应的提升值进行比较，并使用推导的方程计算相对风险和优势比。

结果

结论

提升、相对风险和优势比呈正相关，具有相同的零值。然而，提升取决于暴露流行率，因此解释起来并不简单，也不便于用于比较关联强度。提供了从提升中获得相对风险和优势比的工具。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

数据挖掘中的索引提升与流行病学研究中的关联度量相对风险密切相关。

The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

数据挖掘中的索引提升与流行病学研究中的关联度量相对风险密切相关。

The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献