Suppr超能文献

典型相关分析和偏最小二乘法在识别脑-行为关联中的应用:教程和比较研究。

Canonical Correlation Analysis and Partial Least Squares for Identifying Brain-Behavior Associations: A Tutorial and a Comparative Study.

机构信息

Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom; Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom.

Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom.

出版信息

Biol Psychiatry Cogn Neurosci Neuroimaging. 2022 Nov;7(11):1055-1067. doi: 10.1016/j.bpsc.2022.07.012. Epub 2022 Aug 8.

Abstract

Canonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer's Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1-10 and ∼0.1-0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.

摘要

典型相关分析(CCA)和偏最小二乘法(PLS)是捕获数据的两种模态(例如,大脑和行为)之间关联的强大多元方法。然而,当样本量与数据中的变量数量相似或更小,标准的 CCA 和 PLS 模型可能会过度拟合,即找到对新数据泛化能力较差的虚假关联。CCA 和 PLS 的降维和正则化扩展已经被提出以解决这个问题,然而,大多数使用这些方法的研究都存在一些局限性。这项工作对最常见的 CCA/PLS 模型及其正则化变体进行了理论和实践介绍。我们检查了当样本量与变量数量相似或更小时标准 CCA 和 PLS 的局限性。我们讨论了降维和正则化技术如何解决这个问题,并解释了它们的主要优点和缺点。我们强调了 CCA/PLS 分析框架的关键方面,包括优化模型的超参数和检验识别出的关联的统计学意义。我们将描述的 CCA/PLS 模型应用于模拟数据和来自人类连接组计划和阿尔茨海默病神经影像学倡议的真实数据(两者的 n > 500)。我们使用这些数据的低维和高维版本(即样本量与变量之间的比率分别在 1-10 和 0.1-0.01 范围内)来演示数据维数对模型的影响。最后,我们总结了本教程的关键经验。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验