Lara-Abelenda Francisco J, Chushig-Muzo David, Peiro-Corbacho Pablo, Gómez-Martínez Vanesa, Wägner Ana M, Granja Conceição, Soguero-Ruiz Cristina
Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain.
Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain.
J Biomed Inform. 2025 May;165:104821. doi: 10.1016/j.jbi.2025.104821. Epub 2025 Apr 8.
Machine learning (ML) models have been extensively used for tabular data classification but recent works have been developed to transform tabular data into images, aiming to leverage the predictive performance of convolutional neural networks (CNNs). However, most of these approaches fail to convert data with a low number of samples and mixed-type features. This study aims: to evaluate the performance of the tabular-to-image method named low mixed-image generator for tabular data (LM-IGTD); and to assess the effectiveness of transfer learning and fine-tuning for improving predictions on tabular data.
We employed two public tabular datasets with patients diagnosed with cardiovascular diseases (CVDs): Framingham and Steno. First, both datasets were transformed into images using LM-IGTD. Then, Framingham, which contains a larger set of samples than Steno, is used to train CNN-based models. Finally, we performed transfer learning and fine-tuning using the pre-trained CNN on the Steno dataset to predict CVD risk.
The CNN-based model with transfer learning achieved the highest AUCORC in Steno (0.855), outperforming ML models such as decision trees, K-nearest neighbors, least absolute shrinkage and selection operator (LASSO) support vector machine and TabPFN. This approach improved accuracy by 2% over the best-performing traditional model, TabPFN.
To the best of our knowledge, this is the first study that evaluates the effectiveness of applying transfer learning and fine-tuning to tabular data using tabular-to-image approaches. Through the use of CNNs' predictive capabilities, our work also advances the diagnosis of CVD by providing a framework for early clinical intervention and decision-making support.
机器学习(ML)模型已广泛用于表格数据分类,但最近已开展相关工作将表格数据转换为图像,旨在利用卷积神经网络(CNN)的预测性能。然而,这些方法大多无法转换样本数量少且具有混合类型特征的数据。本研究旨在:评估名为表格数据低混合图像生成器(LM-IGTD)的表格到图像方法的性能;评估迁移学习和微调对改进表格数据预测的有效性。
我们采用了两个诊断为心血管疾病(CVD)患者的公共表格数据集:弗雷明汉和斯滕诺。首先,使用LM-IGTD将这两个数据集都转换为图像。然后,包含比斯滕诺更大样本集的弗雷明汉用于训练基于CNN的模型。最后,我们在斯滕诺数据集上使用预训练的CNN进行迁移学习和微调,以预测CVD风险。
基于CNN的迁移学习模型在斯滕诺数据集中实现了最高的曲线下面积(AUC)(0.855),优于决策树、K近邻、最小绝对收缩和选择算子(LASSO)支持向量机和TabPFN等ML模型。该方法比表现最佳的传统模型TabPFN的准确率提高了2%。
据我们所知,这是第一项评估使用表格到图像方法对表格数据应用迁移学习和微调有效性的研究。通过利用CNN的预测能力,我们的工作还通过提供早期临床干预和决策支持框架推进了CVD的诊断。