Huang Hongtai, Tornero-Velez Rogelio, Barzyk Timothy M
Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, Tennessee 37830, USA.
National Exposure Research Laboratory, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27709, USA.
J Expo Sci Environ Epidemiol. 2017 Nov;27(6):544-550. doi: 10.1038/jes.2017.15. Epub 2017 Sep 13.
Association rule mining (ARM) has been widely used to identify associations between various entities in many fields. Although some studies have utilized it to analyze the relationship between chemicals and human health effects, fewer have used this technique to identify and quantify associations between environmental and social stressors. Socio-demographic variables were generated based on U.S. Census tract-level income, race/ethnicity population percentage, education level, and age information from the 2010-2014, 5-Year Summary files in the American Community Survey (ACS) database, and chemical variables were generated by utilizing the 2011 National-Scale Air Toxics Assessment (NATA) census tract-level air pollutant exposure concentration data. Six mobile- and industrial-source pollutants were chosen for analysis, including acetaldehyde, benzene, cyanide, particulate matter components of diesel engine emissions (namely, diesel PM), toluene, and 1,3-butadiene. ARM was then applied to quantify and visualize the associations between the chemical and socio-demographic variables. Census tracts with a high percentage of racial/ethnic minorities and populations with low income tended to have higher estimated chemical exposure concentrations (fourth quartile), especially for diesel PM, 1,3-butadiene, and toluene. In contrast, census tracts with an average population age of 40-50 years, a low percentage of racial/ethnic minorities, and moderate-income levels were more likely to have lower estimated chemical exposure concentrations (first quartile). Unsupervised data mining methods can be used to evaluate potential associations between environmental inequalities and social disparities, while providing support in public health decision-making contexts.
关联规则挖掘(ARM)已被广泛用于识别许多领域中各种实体之间的关联。尽管一些研究已利用它来分析化学物质与人类健康影响之间的关系,但较少有研究使用该技术来识别和量化环境与社会压力源之间的关联。社会人口统计学变量是根据美国社区调查(ACS)数据库中2010 - 2014年5年总结文件的美国人口普查区层面的收入、种族/族裔人口百分比、教育水平和年龄信息生成的,化学变量是通过利用2011年全国尺度空气毒物评估(NATA)人口普查区层面的空气污染物暴露浓度数据生成的。选择了六种移动源和工业源污染物进行分析,包括乙醛、苯、氰化物、柴油发动机排放的颗粒物成分(即柴油颗粒物)、甲苯和1,3 - 丁二烯。然后应用关联规则挖掘来量化和可视化化学变量与社会人口统计学变量之间的关联。种族/族裔少数群体比例高且收入低的人口普查区往往具有较高的估计化学暴露浓度(第四四分位数),特别是对于柴油颗粒物、1,3 - 丁二烯和甲苯。相比之下,平均人口年龄为40 - 50岁、种族/族裔少数群体比例低且收入水平中等的人口普查区更有可能具有较低的估计化学暴露浓度(第一四分位数)。无监督数据挖掘方法可用于评估环境不平等与社会差异之间的潜在关联,同时为公共卫生决策背景提供支持。