Uematsu Yoshimasa, Fan Yingying, Chen Kun, Lv Jinchi, Lin Wei
Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871.
IEEE Trans Inf Theory. 2019 Aug;65(8):4924-4939. doi: 10.1109/tit.2019.2909889. Epub 2019 Apr 11.
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulations and real data examples.
许多现代大数据应用在响应数量和预测变量方面都具有大规模特征。通过按重要性排序的稀疏潜在因子层来理解大规模响应 - 预测变量关联网络结构,可以实现更高的统计效率和科学见解。然而,稀疏性和正交性一直是两个在很大程度上不相容的目标。为了兼顾这两个特征,在本文中,我们提出了稀疏正交因子回归(SOFAR)方法,通过具有正交性约束优化的稀疏奇异值分解来学习潜在的关联网络,该方法在无监督和监督学习任务中都有广泛应用,如使用稀疏奇异值分解的双聚类、稀疏主成分分析、稀疏因子分析和稀疏向量自回归分析。利用凸性辅助非凸优化框架,我们为所提出的过程推导了非渐近误差界,以表征其理论优势。这些统计保证由具有收敛性质的高效SOFAR算法提供支持。我们通过几个模拟和实际数据示例展示了该过程的计算优势和理论优势。