Suppr超能文献

用多向数据区分样本组。

Discriminating sample groups with multi-way data.

作者信息

Lyu Tianmeng, Lock Eric F, Eberly Lynn E

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

Biostatistics. 2017 Jul 1;18(3):434-450. doi: 10.1093/biostatistics/kxw057.

Abstract

High-dimensional linear classifiers, such as distance weighted discrimination (DWD) and versions of the support vector machine (SVM), are commonly used in biomedical research to distinguish groups of subjects based on a large number of features. However, their use is limited to applications where a single vector of features is measured for each subject. In practice, data are often multi-way, or measured over multiple dimensions. For example, metabolite abundance may be measured over multiple regions or tissues, or gene expression may be measured over multiple time points, for the same subjects. We propose a framework for linear classification of high-dimensional multi-way data, in which coefficients can be factorized into weights that are specific to each dimension. More generally, the coefficients for each measurement in a multi-way dataset are assumed to have low-rank structure. This framework extends existing classification techniques from single vector to multi-way features, and we have implemented multi-way versions of SVM and DWD. We describe informative simulation results, and apply multi-way DWD to data for two very different clinical research studies. The first study uses magnetic resonance spectroscopy metabolite data over multiple brain regions to compare participants with and without spinocerebellar ataxia; the second uses publicly available gene expression time-course data to compare degrees of treatment response among patients with multiple sclerosis. Our multi-way method can improve performance and simplify interpretation over naive applications of full rank linear and non-linear classification to multi-way data. The R package is available at https://github.com/lockEF/MultiwayClassification.

摘要

高维线性分类器,如距离加权判别法(DWD)和支持向量机(SVM)的多种版本,在生物医学研究中常用于根据大量特征区分不同的受试者群体。然而,它们的应用仅限于对每个受试者测量单个特征向量的情况。在实际中,数据往往是多向的,或者是在多个维度上进行测量的。例如,对于同一受试者,代谢物丰度可能在多个区域或组织上进行测量,或者基因表达可能在多个时间点上进行测量。我们提出了一个用于高维多向数据线性分类的框架,其中系数可以分解为特定于每个维度的权重。更一般地说,多向数据集中每个测量的系数都假定具有低秩结构。该框架将现有的分类技术从单向量特征扩展到多向特征,并且我们已经实现了SVM和DWD的多向版本。我们描述了信息丰富的模拟结果,并将多向DWD应用于两项截然不同的临床研究的数据。第一项研究使用多个脑区的磁共振波谱代谢物数据来比较患有和未患有脊髓小脑共济失调的参与者;第二项研究使用公开可用的基因表达时间序列数据来比较多发性硬化症患者的治疗反应程度。与将满秩线性和非线性分类简单应用于多向数据相比,我们的多向方法可以提高性能并简化解释。R包可在https://github.com/lockEF/MultiwayClassification获取。

相似文献

1
Discriminating sample groups with multi-way data.用多向数据区分样本组。
Biostatistics. 2017 Jul 1;18(3):434-450. doi: 10.1093/biostatistics/kxw057.
2
Multiway sparse distance weighted discrimination.多路稀疏距离加权判别
J Comput Graph Stat. 2023;32(2):730-743. doi: 10.1080/10618600.2022.2099404. Epub 2022 Aug 30.
3
Bayesian Distance Weighted Discrimination.贝叶斯距离加权判别法
J Comput Graph Stat. 2022;31(4):1177-1188. doi: 10.1080/10618600.2022.2069778. Epub 2022 May 26.

引用本文的文献

1
Multiway sparse distance weighted discrimination.多路稀疏距离加权判别
J Comput Graph Stat. 2023;32(2):730-743. doi: 10.1080/10618600.2022.2099404. Epub 2022 Aug 30.
2
Bayesian predictive modeling of multi-source multi-way data.多源多向数据的贝叶斯预测建模
Comput Stat Data Anal. 2023 Oct;186. doi: 10.1016/j.csda.2023.107783. Epub 2023 May 19.
3
Bayesian Distance Weighted Discrimination.贝叶斯距离加权判别法
J Comput Graph Stat. 2022;31(4):1177-1188. doi: 10.1080/10618600.2022.2069778. Epub 2022 May 26.
7
Tensor-on-tensor regression.张量对张量回归
J Comput Graph Stat. 2018;27(3):638-647. doi: 10.1080/10618600.2017.1401544. Epub 2018 Jun 6.
8
Supervised multiway factorization.监督式多路分解
Electron J Stat. 2018;12(1):1150-1180. doi: 10.1214/18-EJS1421. Epub 2018 Mar 27.

本文引用的文献

1
Tucker Tensor Regression and Neuroimaging Analysis.塔克张量回归与神经影像分析
Stat Biosci. 2018 Dec;10(3):520-545. doi: 10.1007/s12561-018-9215-6. Epub 2018 Mar 7.
2
Bayesian factorizations of big sparse tensors.大稀疏张量的贝叶斯因式分解
J Am Stat Assoc. 2015;110(512):1562-1576. doi: 10.1080/01621459.2014.983233. Epub 2016 Jan 15.
7
Classification of patients from time-course gene expression.基于时间进程基因表达的患者分类。
Biostatistics. 2013 Jan;14(1):87-98. doi: 10.1093/biostatistics/kxs027. Epub 2012 Aug 27.
9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验