Guo Bin, Eberly Lynn E, Henry Pierre-Gilles, Lenglet Christophe, Lock Eric F
Division of Biostatistics, School of Public Health.
Center for Magnetic Resonance Research, University of Minnesota.
J Comput Graph Stat. 2023;32(2):730-743. doi: 10.1080/10618600.2022.2099404. Epub 2022 Aug 30.
Modern data often take the form of a multiway array. However, most classification methods are designed for vectors, i.e., 1-way arrays. Distance weighted discrimination (DWD) is a popular high-dimensional classification method that has been extended to the multiway context, with dramatic improvements in performance when data have multiway structure. However, the previous implementation of multiway DWD was restricted to classification of matrices, and did not account for sparsity. In this paper, we develop a general framework for multiway classification which is applicable to any number of dimensions and any degree of sparsity. We conducted extensive simulation studies, showing that our model is robust to the degree of sparsity and improves classification accuracy when the data have multiway structure. For our motivating application, magnetic resonance spectroscopy (MRS) was used to measure the abundance of several metabolites across multiple neurological regions and across multiple time points in a mouse model of Friedreich's ataxia, yielding a four-way data array. Our method reveals a robust and interpretable multi-region metabolomic signal that discriminates the groups of interest. We also successfully apply our method to gene expression time course data for multiple sclerosis treatment. An R implementation is available in the package MultiwayClassification at http://github.com/lockEF/MultiwayClassification.
现代数据通常采用多路数组的形式。然而,大多数分类方法是为向量(即一维数组)设计的。距离加权判别法(DWD)是一种流行的高维分类方法,已被扩展到多路情形,当数据具有多路结构时,性能有显著提升。然而,之前多路DWD的实现仅限于矩阵分类,且未考虑稀疏性。在本文中,我们开发了一个适用于任意维度和任意稀疏程度的多路分类通用框架。我们进行了广泛的模拟研究,结果表明我们的模型对稀疏程度具有鲁棒性,并且当数据具有多路结构时能提高分类准确率。对于我们的激励性应用,在弗里德赖希共济失调小鼠模型中,利用磁共振波谱(MRS)测量多个神经区域和多个时间点上几种代谢物的丰度,得到一个四维数据数组。我们的方法揭示了一个稳健且可解释的多区域代谢组学信号,该信号能够区分感兴趣的组。我们还成功地将我们的方法应用于多发性硬化症治疗的基因表达时间进程数据。R语言实现可在http://github.com/lockEF/MultiwayClassification的MultiwayClassification包中获取。