Division of Biostatistics, Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden.
Applied Statistics Research Unit, Institute of Statistics and Mathematical Methods in Economics, TU Wien, Vienna, Austria.
Stat Med. 2022 Jul 10;41(15):2768-2785. doi: 10.1002/sim.9383. Epub 2022 Mar 24.
We propose a new method for multivariate response regression and covariance estimation when elements of the response vector are of mixed types, for example some continuous and some discrete. Our method is based on a model which assumes the observable mixed-type response vector is connected to a latent multivariate normal response linear regression through a link function. We explore the properties of this model and show its parameters are identifiable under reasonable conditions. We impose no parametric restrictions on the covariance of the latent normal other than positive definiteness, thereby avoiding assumptions about unobservable variables which can be difficult to verify in practice. To accommodate this generality, we propose a novel algorithm for approximate maximum likelihood estimation that works "off-the-shelf" with many different combinations of response types, and which scales well in the dimension of the response vector. Our method typically gives better predictions and parameter estimates than fitting separate models for the different response types and allows for approximate likelihood ratio testing of relevant hypotheses such as independence of responses. The usefulness of the proposed method is illustrated in simulations; and one biomedical and one genomic data example.
我们提出了一种新的方法,用于当响应向量的元素为混合类型(例如,一些连续的和一些离散的)时进行多元响应回归和协方差估计。我们的方法基于一种模型,该模型假设可观察的混合类型响应向量通过链接函数与潜在的多元正态响应线性回归相关联。我们探讨了该模型的性质,并证明在合理的条件下其参数是可识别的。我们对潜在正态的协方差没有施加除正定以外的任何参数限制,从而避免了对不可观测变量的假设,这些假设在实践中可能难以验证。为了适应这种通用性,我们提出了一种新的算法,用于近似最大似然估计,该算法可以与许多不同的响应类型组合“现成”使用,并且在响应向量的维度上具有良好的扩展性。我们的方法通常比为不同的响应类型拟合单独的模型提供更好的预测和参数估计,并允许对相关假设(例如响应的独立性)进行近似似然比检验。所提出的方法在模拟和一个生物医学和一个基因组数据示例中得到了验证。