Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich, Jülich, Germany.
Hum Brain Mapp. 2024 Apr 15;45(6):e26683. doi: 10.1002/hbm.26683.
Machine learning (ML) approaches are increasingly being applied to neuroimaging data. Studies in neuroscience typically have to rely on a limited set of training data which may impair the generalizability of ML models. However, it is still unclear which kind of training sample is best suited to optimize generalization performance. In the present study, we systematically investigated the generalization performance of sex classification models trained on the parcelwise connectivity profile of either single samples or compound samples of two different sizes. Generalization performance was quantified in terms of mean across-sample classification accuracy and spatial consistency of accurately classifying parcels. Our results indicate that the generalization performance of parcelwise classifiers (pwCs) trained on single dataset samples is dependent on the specific test samples. Certain datasets seem to "match" in the sense that classifiers trained on a sample from one dataset achieved a high accuracy when tested on the respected other one and vice versa. The pwCs trained on the compound samples demonstrated overall highest generalization performance for all test samples, including one derived from a dataset not included in building the training samples. Thus, our results indicate that both a large sample size and a heterogeneous data composition of a training sample have a central role in achieving generalizable results.
机器学习(ML)方法越来越多地应用于神经影像学数据。神经科学研究通常必须依赖于有限的训练数据集,这可能会影响 ML 模型的泛化能力。然而,目前仍不清楚哪种训练样本最适合优化泛化性能。在本研究中,我们系统地研究了基于单个样本或两种不同大小的复合样本的连接轮廓进行性别分类模型训练的泛化性能。泛化性能通过平均跨样本分类准确性和准确分类体素的空间一致性来量化。我们的结果表明,基于单个数据集样本训练的体素分类器(pwC)的泛化性能取决于特定的测试样本。某些数据集似乎“匹配”,即从一个数据集训练的分类器在对相应的另一个数据集进行测试时具有很高的准确性,反之亦然。基于复合样本训练的 pwC 对所有测试样本(包括来自未包含在训练样本构建中的数据集的样本)表现出了最高的总体泛化性能。因此,我们的结果表明,大样本量和训练样本的异质数据组成在实现可泛化的结果方面起着核心作用。