Xiao Luo, Zipunnikov Vadim, Ruppert David, Crainiceanu Ciprian
Department of Biostatistics, Johns Hopkins University, Baltimore, MD.
Department of Statistical Science and School of Operations Research and Information Engineering, Cornell University, Ithaca, NY.
Stat Comput. 2016 Jan 1;26(1):409-421. doi: 10.1007/s11222-014-9485-x. Epub 2014 Jun 27.
We propose two fast covariance smoothing methods and associated software that scale up linearly with the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension > 500; a recently introduced sandwich smoother is an exception but is not adapted to smooth covariance matrices of large dimensions, such as = 10, 000. We introduce two new methods that circumvent those problems: 1) a fast implementation of the sandwich smoother for covariance smoothing; and 2) a two-step procedure that first obtains the singular value decomposition of the data matrix and then smoothes the eigenvectors. These new approaches are at least an order of magnitude faster in high dimensions and drastically reduce computer memory requirements. The new approaches provide instantaneous (a few seconds) smoothing for matrices of dimension = 10,000 and very fast (< 10 minutes) smoothing for = 100, 000. R functions, simulations, and data analysis provide ready to use, reproducible, and scalable tools for practical data analysis of noisy high-dimensional functional data.
我们提出了两种快速协方差平滑方法及相关软件,它们随每个函数观测值数量呈线性扩展。大多数现有方法和软件无法平滑维度大于500的协方差矩阵;最近引入的三明治平滑器是个例外,但它不适合平滑大维度的协方差矩阵,比如维度为10000的矩阵。我们介绍了两种新方法来规避这些问题:1)用于协方差平滑的三明治平滑器的快速实现;2)一种两步法,先获取数据矩阵的奇异值分解,然后平滑特征向量。这些新方法在高维度下至少快一个数量级,并大幅降低计算机内存需求。新方法可为维度为10000的矩阵提供即时(几秒)平滑,为维度为100000的矩阵提供非常快速(<10分钟)的平滑。R函数、模拟和数据分析为有噪声的高维函数数据的实际数据分析提供了随时可用、可重现且可扩展的工具。