Kershenbaum Anne D, Langston Michael A, Levine Robert S, Saxton Arnold M, Oyana Tonny J, Kilbourne Barbara J, Rogers Gary L, Gittner Lisaann S, Baktash Suzanne H, Matthews-Juarez Patricia, Juarez Paul D
Department of Public Health, University of Tennessee, Knoxville, TN 37996, USA.
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA.
Int J Environ Res Public Health. 2014 Nov 28;11(12):12346-66. doi: 10.3390/ijerph111212346.
Recent advances in informatics technology has made it possible to integrate, manipulate, and analyze variables from a wide range of scientific disciplines allowing for the examination of complex social problems such as health disparities. This study used 589 county-level variables to identify and compare geographical variation of high and low preterm birth rates. Data were collected from a number of publically available sources, bringing together natality outcomes with attributes of the natural, built, social, and policy environments. Singleton early premature county birth rate, in counties with population size over 100,000 persons provided the dependent variable. Graph theoretical techniques were used to identify a wide range of predictor variables from various domains, including black proportion, obesity and diabetes, sexually transmitted infection rates, mother's age, income, marriage rates, pollution and temperature among others. Dense subgraphs (paracliques) representing groups of highly correlated variables were resolved into latent factors, which were then used to build a regression model explaining prematurity (R-squared = 76.7%). Two lists of counties with large positive and large negative residuals, indicating unusual prematurity rates given their circumstances, may serve as a starting point for ways to intervene and reduce health disparities for preterm births.
信息学技术的最新进展使得整合、处理和分析来自广泛科学学科的变量成为可能,从而能够研究诸如健康差距等复杂的社会问题。本研究使用了589个县级变量来识别和比较早产率高和低的地理差异。数据从多个公开可用的来源收集,将出生结局与自然、建筑、社会和政策环境的属性结合在一起。人口超过10万的县的单胎早期早产县出生率作为因变量。使用图论技术从各个领域识别出广泛的预测变量,包括黑人比例、肥胖和糖尿病、性传播感染率、母亲年龄、收入、结婚率、污染和温度等。代表高度相关变量组的密集子图(准团)被分解为潜在因素,然后用于建立解释早产的回归模型(决定系数R² = 76.7%)。两份分别列出正残差和负残差较大的县的清单,表明在其情况下早产率异常,可作为干预和减少早产健康差距方法的起点。