Dang Tung, Fermin Alan S R, Machizawa Maro G
Center for Brain, Mind, and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan.
Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
Front Neuroinform. 2023 Sep 26;17:1266713. doi: 10.3389/fninf.2023.1266713. eCollection 2023.
The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, , with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.
神经影像数据的复杂性和高维度给使用机器学习(ML)模型解码信息带来了问题,因为特征数量往往远多于观测数量。特征选择是在解码中确定有意义的目标特征的关键步骤之一;然而,使用传统ML模型从如此高维的神经影像数据中优化特征选择一直具有挑战性。在此,我们引入了一个高效且高性能的解码软件包,它结合了前向变量选择(FVS)算法和超参数优化,能自动为分类和回归模型识别出最佳特征对,默认总共实现了18个ML模型。首先,FVS算法使用k折交叉验证步骤评估不同模型的拟合优度,该步骤基于每个模型的预定义标准识别出最佳特征子集。接下来,在每次前向迭代中对每个ML模型的超参数进行优化。最终输出突出显示每个模型所选特征(感兴趣的脑区)的优化数量及其准确率。此外,该工具箱可在并行环境中执行,以便在典型的个人计算机上进行高效计算。通过优化的前向变量选择解码器(oFVSD)管道,我们在1113个结构磁共振成像(MRI)数据集上验证了解码性别分类和年龄范围回归的有效性。与没有FVS算法且使用Boruta算法作为变量选择对应方法的ML模型相比,我们证明oFVSD在所有ML模型中显著优于没有FVS的对应模型(相关系数平均增加约0.20,回归模型中如此,分类模型中平均增加8%)以及使用Boruta变量选择算法的模型(回归模型中约提高0.07,分类模型中提高4%)。此外,我们证实使用并行计算大大减轻了高维MRI数据的计算负担。总之,oFVSD工具箱有效且高效地提高了分类和回归ML模型的性能,并在MRI数据集上提供了一个用例示例。凭借其灵活性,oFVSD在神经影像的许多其他模态中具有潜力。这个开源且免费可用的Python软件包使其成为寻求提高解码准确性的研究社区的宝贵工具箱。