Wolf Bethany J, Slate Elizabeth H, Hill Elizabeth G
Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29464.
Department of Statistics, Florida State University, Tallahassee, FL 32306.
Comput Stat Data Anal. 2015 Feb 1;82:152-163. doi: 10.1016/j.csda.2014.08.013.
In medicine, it is often useful to stratify patients according to disease risk, severity, or response to therapy. Since many diseases arise from complex gene-gene and gene-environment interactions, patient strata may be defined by combinations of genetic and environmental factors. Traditional statistical methods require specifying interactions making it difficult to identify high order interactions. Alternatively, machine learning methods can model complex interactions, however these models are often difficult to interpret in a clinical setting. Logic regression (LR) enables modeling a binary outcome using logical combinations of binary predictors yielding easily interpretable models. However LR, as currently available, cannot model ordinal responses. This paper extends LR to model an ordinal response and the resulting method is called Ordinal Logic Regression (OLR). Several simulations comparing OLR and Classification and Regression Trees (CART) demonstrate that OLR is superior to CART for identifying variable interactions associated with an ordinal response. OLR is applied to data from a study to determine associations between genetic and health factors with severity of adult periodontitis.
在医学中,根据疾病风险、严重程度或对治疗的反应对患者进行分层通常很有用。由于许多疾病源于复杂的基因-基因和基因-环境相互作用,患者分层可能由遗传和环境因素的组合来定义。传统统计方法需要指定相互作用,这使得识别高阶相互作用变得困难。相比之下,机器学习方法可以对复杂的相互作用进行建模,然而这些模型在临床环境中通常难以解释。逻辑回归(LR)能够使用二元预测变量的逻辑组合对二元结果进行建模,从而产生易于解释的模型。然而,目前可用的LR无法对有序反应进行建模。本文将LR扩展到对有序反应进行建模,由此产生的方法称为有序逻辑回归(OLR)。几个比较OLR和分类与回归树(CART)的模拟表明,在识别与有序反应相关的变量相互作用方面,OLR优于CART。OLR被应用于一项研究的数据,以确定遗传和健康因素与成人牙周炎严重程度之间的关联。