Toti Giulia, Vilalta Ricardo, Lindner Peggy, Lefer Barry, Macias Charles, Price Daniel
Department of Computer Science, University of Houston, Philip Guthrie Hoffman Hall, 3551 Cullen Blvd., Room 501, Houston, TX 77204-3010, USA.
Department of Computer Science, University of Houston, Philip Guthrie Hoffman Hall, 3551 Cullen Blvd., Room 501, Houston, TX 77204-3010, USA.
Artif Intell Med. 2016 Nov;74:44-52. doi: 10.1016/j.artmed.2016.11.003. Epub 2016 Nov 25.
Traditional studies on effects of outdoor pollution on asthma have been criticized for questionable statistical validity and inefficacy in exploring the effects of multiple air pollutants, alone and in combination. Association rule mining (ARM), a method easily interpretable and suitable for the analysis of the effects of multiple exposures, could be of use, but the traditional interest metrics of support and confidence need to be substituted with metrics that focus on risk variations caused by different exposures.
We present an ARM-based methodology that produces rules associated with relevant odds ratios and limits the number of final rules even at very low support levels (0.5%), thanks to post-pruning criteria that limit rule redundancy and control for statistical significance. The methodology has been applied to a case-crossover study to explore the effects of multiple air pollutants on risk of asthma in pediatric subjects.
We identified 27 rules with interesting odds ratio among more than 10,000 having the required support. The only rule including only one chemical is exposure to ozone on the previous day of the reported asthma attack (OR=1.14). 26 combinatory rules highlight the limitations of air quality policies based on single pollutant thresholds and suggest that exposure to mixtures of chemicals is more harmful, with odds ratio as high as 1.54 (associated with the combination day0 SO, day0 NO, day0 NO, day1 PM).
The proposed method can be used to analyze risk variations caused by single and multiple exposures. The method is reliable and requires fewer assumptions on the data than parametric approaches. Rules including more than one pollutant highlight interactions that deserve further investigation, while helping to limit the search field.
传统的关于室外污染对哮喘影响的研究因统计有效性存疑以及在探索多种空气污染物单独及联合作用的效果方面效率低下而受到批评。关联规则挖掘(ARM)是一种易于解释且适用于分析多种暴露因素影响的方法,可能会有所帮助,但传统的支持度和置信度等兴趣度指标需要用关注不同暴露因素引起的风险变化的指标来替代。
我们提出一种基于ARM的方法,该方法能生成与相关比值比相关的规则,并且由于有后剪枝标准来限制规则冗余并控制统计显著性,即使在非常低的支持度水平(0.5%)下也能限制最终规则的数量。该方法已应用于一项病例交叉研究,以探索多种空气污染物对儿科患者哮喘风险的影响。
在超过10000条满足所需支持度的规则中,我们识别出27条具有有趣比值比的规则。唯一一条仅包含一种化学物质的规则是在报告的哮喘发作前一天暴露于臭氧(比值比=1.14)。26条组合规则突出了基于单一污染物阈值的空气质量政策的局限性,并表明接触化学物质混合物更有害,比值比高达1.54(与第0天的二氧化硫、第0天的一氧化氮、第0天的二氧化氮、第1天的颗粒物的组合相关)。
所提出的方法可用于分析单一和多种暴露因素引起的风险变化。该方法可靠,与参数方法相比,对数据的假设更少。包含多种污染物的规则突出了值得进一步研究的相互作用,同时有助于限制搜索范围。