Lock Eric F, Li Gen
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455.
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032.
Electron J Stat. 2018;12(1):1150-1180. doi: 10.1214/18-EJS1421. Epub 2018 Mar 27.
We describe a probabilistic PARAFAC/CANDECOMP (CP) factorization for multiway (i.e., tensor) data that incorporates auxiliary covariates, . SupCP generalizes the supervised singular value decomposition (SupSVD) for vector-valued observations, to allow for observations that have the form of a matrix or higher-order array. Such data are increasingly encountered in biomedical research and other fields. We use a novel likelihood-based latent variable representation of the CP factorization, in which the latent variables are informed by additional covariates. We give conditions for identifiability, and develop an EM algorithm for simultaneous estimation of all model parameters. SupCP can be used for dimension reduction, capturing latent structures that are more accurate and interpretable due to covariate supervision. Moreover, SupCP specifies a full probability distribution for a multiway data observation with given covariate values, which can be used for predictive modeling. We conduct comprehensive simulations to evaluate the SupCP algorithm. We apply it to a facial image database with facial descriptors (e.g., smiling / not smiling) as covariates, and to a study of amino acid fluorescence. Software is available at https://github.com/lockEF/SupCP.
我们描述了一种用于多路(即张量)数据的概率平行因子分析/同时对角化分解(CP)因式分解方法,该方法纳入了辅助协变量。SupCP将用于向量值观测的监督奇异值分解(SupSVD)进行了推广,以适用于具有矩阵或更高阶数组形式的观测。此类数据在生物医学研究和其他领域中越来越常见。我们使用一种基于新颖似然的CP因式分解潜在变量表示,其中潜在变量由额外的协变量提供信息。我们给出了可识别性条件,并开发了一种用于同时估计所有模型参数的期望最大化(EM)算法。SupCP可用于降维,捕捉由于协变量监督而更准确且可解释的潜在结构。此外,SupCP为具有给定协变量值的多路数据观测指定了一个完整的概率分布,可用于预测建模。我们进行了全面的模拟以评估SupCP算法。我们将其应用于一个面部图像数据库,其中面部描述符(例如,微笑/不微笑)作为协变量,以及一项氨基酸荧光研究。软件可在https://github.com/lockEF/SupCP获取。