Department of Neuroscience and Biomedical Engineering, Aalto University, Finland.
Department of Neuroscience and Biomedical Engineering, Aalto University, Finland.
Neuroimage. 2020 Jan 1;204:116221. doi: 10.1016/j.neuroimage.2019.116221. Epub 2019 Sep 26.
Linear machine learning models "learn" a data transformation by being exposed to examples of input with the desired output, forming the basis for a variety of powerful techniques for analyzing neuroimaging data. However, their ability to learn the desired transformation is limited by the quality and size of the example dataset, which in neuroimaging studies is often notoriously noisy and small. In these cases, it is desirable to fine-tune the learned linear model using domain information beyond the example dataset. To this end, we present a framework that decomposes the weight matrix of a fitted linear model into three subcomponents: the data covariance, the identified signal of interest, and a normalizer. Inspecting these subcomponents in isolation provides an intuitive way to inspect the inner workings of the model and assess its strengths and weaknesses. Furthermore, the three subcomponents may be altered, which provides a straightforward way to inject prior information and impose additional constraints. We refer to this process as "post-hoc modification" of a model and demonstrate how it can be used to achieve precise control over which aspects of the model are fitted to the data through machine learning and which are determined through domain information. As an example use case, we decode the associative strength between words from electroencephalography (EEG) reading data. Our results show how the decoding accuracy of two example linear models (ridge regression and logistic regression) can be boosted by incorporating information about the spatio-temporal nature of the data, domain information about the N400 evoked potential and data from other participants.
线性机器学习模型通过暴露于具有期望输出的输入示例来“学习”数据转换,为分析神经影像学数据的各种强大技术奠定了基础。然而,它们学习所需转换的能力受到示例数据集的质量和大小的限制,在神经影像学研究中,数据集通常是出了名的嘈杂和小。在这些情况下,需要使用示例数据集之外的领域信息来微调学习的线性模型。为此,我们提出了一个框架,将拟合线性模型的权重矩阵分解为三个子组件:数据协方差、感兴趣的已识别信号和归一化器。单独检查这些子组件提供了一种直观的方法来检查模型的内部工作原理,并评估其优缺点。此外,这三个子组件可以被改变,这为注入先验信息和施加额外约束提供了一种简单的方法。我们将这个过程称为对模型的“后处理修改”,并演示如何通过机器学习来精确控制模型的哪些方面适合数据,以及通过领域信息来确定哪些方面适合数据。作为一个示例用例,我们从脑电图 (EEG) 阅读数据中解码单词之间的联想强度。我们的结果表明,通过合并关于数据的时空性质的信息、关于 N400 诱发电位的领域信息以及来自其他参与者的数据,可以提高两个示例线性模型(岭回归和逻辑回归)的解码准确性。