Lee Seokho, Huang Jianhua Z, Hu Jianhua
Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA,
Ann Appl Stat. 2010 Sep 1;4(3):1579-1601. doi: 10.1214/10-AOAS327SUPP.
We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from penalized Bernoulli likelihood. A Majorization-Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study.
我们为二元数据开发了一种新的主成分分析(PCA)类型的降维方法。与基于观测数据定义的标准PCA不同,所提出的PCA是基于二元观测成功概率的对数变换来定义的。在主成分(PC)载荷向量中引入稀疏性,以增强可解释性并更稳定地提取主成分。我们的稀疏PCA被表述为求解一个具有基于惩罚伯努利似然的准则函数的优化问题。开发了一种主元化-最小化算法来有效求解该优化问题。通过应用于一个单核苷酸多态性数据集和一项模拟研究,说明了所提出的稀疏逻辑PCA方法的有效性。