Department of Biostatistics, Vanderbilt University, Nashville, Tennessee.
Institut für Epidemiologie, Biostatistik und Prävention, Universität Zürich, Zürich, Switzerland.
Stat Med. 2020 Feb 28;39(5):562-576. doi: 10.1002/sim.8425. Epub 2019 Dec 6.
Continuous response variables are often transformed to meet modeling assumptions, but the choice of the transformation can be challenging. Two transformation models have recently been proposed: semiparametric cumulative probability models (CPMs) and parametric most likely transformation models (MLTs). Both approaches model the cumulative distribution function and require specifying a link function, which implicitly assumes that the responses follow a known distribution after some monotonic transformation. However, the two approaches estimate the transformation differently. With CPMs, an ordinal regression model is fit, which essentially treats each continuous response as a unique category and therefore nonparametrically estimates the transformation; CPMs are semiparametric linear transformation models. In contrast, with MLTs, the transformation is parameterized using flexible basis functions. Conditional expectations and quantiles are readily derived from both methods on the response variable's original scale. We compare the two methods with extensive simulations. We find that both methods generally have good performance with moderate and large sample sizes. MLTs slightly outperformed CPMs in small sample sizes under correct models. CPMs tended to be somewhat more robust to model misspecification and outcome rounding. Except in the simplest situations, both methods outperform basic transformation approaches commonly used in practice. We apply both methods to an HIV biomarker study.
连续型响应变量通常需要经过转换以满足建模假设,但转换方法的选择可能具有挑战性。最近提出了两种转换模型:半参数累积概率模型(CPM)和参数最可能转换模型(MLT)。这两种方法都对累积分布函数进行建模,并需要指定链接函数,这隐含地假设响应在经过某些单调转换后遵循已知分布。然而,这两种方法对转换的估计方式不同。CPM 拟合有序回归模型,该模型实质上将每个连续响应视为一个独特的类别,因此可以对转换进行非参数估计;CPM 是半参数线性转换模型。相比之下,MLT 使用灵活的基函数对转换进行参数化。在响应变量的原始尺度上,两种方法都可以方便地从响应变量的原始尺度上推导出条件期望和分位数。我们通过广泛的模拟对这两种方法进行了比较。我们发现,这两种方法在中等和大样本量下通常具有良好的性能。在正确的模型下,MLT 在小样本量下的表现略优于 CPM。CPM 对模型的误设和结果舍入的稳健性略高。除了最简单的情况外,这两种方法都优于实践中常用的基本转换方法。我们将这两种方法应用于 HIV 生物标志物研究。