Suppr超能文献

基于功能性脑连接的性别分类:推广至多个数据集 性别分类器的可推广性

Sex classification from functional brain connectivity: Generalization to multiple datasets Generalizability of sex classifiers.

作者信息

Wiersch Lisa, Friedrich Patrick, Hamdan Sami, Komeyer Vera, Hoffstaedter Felix, Patil Kaustubh R, Eickhoff Simon B, Weis Susanne

机构信息

Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.

Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich, Jülich, Germany.

出版信息

bioRxiv. 2024 Mar 20:2023.08.30.555495. doi: 10.1101/2023.08.30.555495.

Abstract

Machine learning (ML) approaches are increasingly being applied to neuroimaging data. Studies in neuroscience typically have to rely on a limited set of training data which may impair the generalizability of ML models. However, it is still unclear which kind of training sample is best suited to optimize generalization performance. In the present study, we systematically investigated the generalization performance of sex classification models trained on the parcelwise connectivity profile of either single samples or a compound sample containing data from four different datasets. Generalization performance was quantified in terms of mean across-sample classification accuracy and spatial consistency of accurately classifying parcels. Our results indicate that generalization performance of pwCs trained on single dataset samples is dependent on the specific test samples. Certain datasets seem to "match" in the sense that classifiers trained on a sample from one dataset achieved a high accuracy when tested on the respected other one and vice versa. The pwC trained on the compound sample demonstrated overall highest generalization performance for all test samples, including one derived from a dataset not included in building the training samples. Thus, our results indicate that a big and heterogenous training sample comprising data of multiple datasets is best suited to achieve generalizable results.

摘要

机器学习(ML)方法正越来越多地应用于神经影像学数据。神经科学研究通常不得不依赖有限的一组训练数据,这可能会损害ML模型的泛化能力。然而,哪种训练样本最适合优化泛化性能仍不清楚。在本研究中,我们系统地研究了基于单个样本或包含来自四个不同数据集数据的复合样本的逐块连通性概况训练的性别分类模型的泛化性能。泛化性能通过跨样本分类准确率的平均值和准确分类块的空间一致性来量化。我们的结果表明,在单个数据集样本上训练的逐块连通性(pwC)的泛化性能取决于特定的测试样本。某些数据集似乎在某种意义上“匹配”,即基于一个数据集的样本训练的分类器在对另一个数据集进行测试时能达到高精度,反之亦然。基于复合样本训练的pwC对所有测试样本都表现出总体最高的泛化性能,包括一个来自未包含在构建训练样本中的数据集的测试样本。因此,我们的结果表明,包含多个数据集数据的大且异质的训练样本最适合获得可泛化的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9440/10958555/0c36c449afdf/nihpp-2023.08.30.555495v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验