H, Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA.
BMC Bioinformatics. 2011 Jan 27;12:37. doi: 10.1186/1471-2105-12-37.
When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison.
The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR.
The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
在使用标准回归分析研究协变量交互作用和组关联时,因变量和暴露之间的关系可能难以描述。当关系是非线性时,线性建模技术无法捕捉非线性信息含量。具有核的统计学习 (SL) 技术能够解决非线性问题,而无需进行参数假设。然而,这些技术无法产生与流行病学解释相关的结果。本模拟病例对照研究旨在对比特定 SL 技术和逻辑回归 (LR) 建模(代表参数方法)产生的信息嵌入特征和分离边界。SL 技术由核映射与感知机神经网络组合而成。由于 LR 模型具有重要的流行病学解释,因此对 SL 方法进行了修改,以产生类似的解释并生成比值比进行比较。
与 LR 相比,SL 方法能够生成主效应和风险因素交互作用的比值比,更好地捕捉暴露变量与结局之间的非线性关系。
将 SL 方法整合到流行病学中可能会提高对复杂暴露/疾病关系的理解和解释。