利用生成对抗网络进行集成特征选择和表格数据增强，以提高皮肤黑色素瘤的识别和可解释性。

Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability.

作者信息

Gómez-Martínez Vanesa, Chushig-Muzo David, Veierød Marit B, Granja Conceição, Soguero-Ruiz Cristina

机构信息

Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, 28943, Spain.

Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway.

出版信息

BioData Min. 2024 Oct 30;17(1):46. doi: 10.1186/s13040-024-00397-7.

DOI:10.1186/s13040-024-00397-7

PMID:39478549

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11526724/

Abstract

BACKGROUND

Cutaneous melanoma is the most aggressive form of skin cancer, responsible for most skin cancer-related deaths. Recent advances in artificial intelligence, jointly with the availability of public dermoscopy image datasets, have allowed to assist dermatologists in melanoma identification. While image feature extraction holds potential for melanoma detection, it often leads to high-dimensional data. Furthermore, most image datasets present the class imbalance problem, where a few classes have numerous samples, whereas others are under-represented.

METHODS

In this paper, we propose to combine ensemble feature selection (FS) methods and data augmentation with the conditional tabular generative adversarial networks (CTGAN) to enhance melanoma identification in imbalanced datasets. We employed dermoscopy images from two public datasets, PH2 and Derm7pt, which contain melanoma and not-melanoma lesions. To capture intrinsic information from skin lesions, we conduct two feature extraction (FE) approaches, including handcrafted and embedding features. For the former, color, geometric and first-, second-, and higher-order texture features were extracted, whereas for the latter, embeddings were obtained using ResNet-based models. To alleviate the high-dimensionality in the FE, ensemble FS with filter methods were used and evaluated. For data augmentation, we conducted a progressive analysis of the imbalance ratio (IR), related to the amount of synthetic samples created, and evaluated the impact on the predictive results. To gain interpretability on predictive models, we used SHAP, bootstrap resampling statistical tests and UMAP visualizations.

RESULTS

The combination of ensemble FS, CTGAN, and linear models achieved the best predictive results, achieving AUCROC values of 87% (with support vector machine and IR=0.9) and 76% (with LASSO and IR=1.0) for the PH2 and Derm7pt, respectively. We also identified that melanoma lesions were mainly characterized by features related to color, while not-melanoma lesions were characterized by texture features.

CONCLUSIONS

Our results demonstrate the effectiveness of ensemble FS and synthetic data in the development of models that accurately identify melanoma. This research advances skin lesion analysis, contributing to both melanoma detection and the interpretation of main features for its identification.

摘要

背景

皮肤黑色素瘤是最具侵袭性的皮肤癌形式，导致了大多数与皮肤癌相关的死亡。人工智能的最新进展，加上公开的皮肤镜图像数据集的可用性，使得在黑色素瘤识别方面能够协助皮肤科医生。虽然图像特征提取在黑色素瘤检测方面具有潜力，但它往往会导致高维数据。此外，大多数图像数据集存在类别不平衡问题，即少数类别有大量样本，而其他类别样本不足。

方法

在本文中，我们建议将集成特征选择（FS）方法和数据增强与条件表格生成对抗网络（CTGAN）相结合，以增强不平衡数据集中黑色素瘤的识别。我们使用了来自两个公共数据集PH2和Derm7pt的皮肤镜图像，其中包含黑色素瘤和非黑色素瘤病变。为了从皮肤病变中捕捉内在信息，我们进行了两种特征提取（FE）方法，包括手工制作的特征和嵌入特征。对于前者，提取了颜色、几何形状以及一阶、二阶和高阶纹理特征，而对于后者，使用基于ResNet的模型获得嵌入特征。为了减轻特征提取中的高维性，使用并评估了带有过滤方法的集成FS。对于数据增强，我们对与创建的合成样本数量相关的不平衡率（IR）进行了渐进分析，并评估了其对预测结果的影响。为了获得预测模型的可解释性，我们使用了SHAP、自助重采样统计测试和UMAP可视化。

结果

集成FS、CTGAN和线性模型的组合取得了最佳预测结果，PH2和Derm7pt的AUCROC值分别达到87%（支持向量机，IR = 0.9）和76%（LASSO，IR = 1.0）。我们还发现黑色素瘤病变主要以与颜色相关的特征为特征，而非黑色素瘤病变以纹理特征为特征。

结论

我们的结果证明了集成FS和合成数据在开发准确识别黑色素瘤的模型中的有效性。这项研究推动了皮肤病变分析，有助于黑色素瘤检测及其识别主要特征的解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a73/11526724/b4760e448463/13040_2024_397_Fig1_HTML.jpg

相似文献

Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability.

BioData Min. 2024 Oct 30;17(1):46. doi: 10.1186/s13040-024-00397-7.

Dermoscopy lesion classification based on GANs and a fuzzy rank-based ensemble of CNN models.

Phys Med Biol. 2022 Sep 8;67(18). doi: 10.1088/1361-6560/ac8b60.

Multiclass skin lesion localization and classification using deep learning based features fusion and selection framework for smart healthcare.

Neural Netw. 2023 Mar;160:238-258. doi: 10.1016/j.neunet.2023.01.022. Epub 2023 Jan 24.

Skin lesion computational diagnosis of dermoscopic images: Ensemble models based on input feature manipulation.

Comput Methods Programs Biomed. 2017 Oct;149:43-53. doi: 10.1016/j.cmpb.2017.07.009. Epub 2017 Jul 20.

Dermoscopy, with and without visual inspection, for diagnosing melanoma in adults.

Cochrane Database Syst Rev. 2018 Dec 4;12(12):CD011902. doi: 10.1002/14651858.CD011902.pub2.

Integration of morphological preprocessing and fractal based feature extraction with recursive feature elimination for skin lesion types classification.

Comput Methods Programs Biomed. 2019 Sep;178:201-218. doi: 10.1016/j.cmpb.2019.06.018. Epub 2019 Jun 16.

Classification of focal liver lesions in CT images using convolutional neural networks with lesion information augmented patches and synthetic data augmentation.

Med Phys. 2021 Sep;48(9):5029-5046. doi: 10.1002/mp.15118. Epub 2021 Aug 4.

Exploring dermoscopic structures for melanoma lesions' classification.

Front Big Data. 2024 Mar 25;7:1366312. doi: 10.3389/fdata.2024.1366312. eCollection 2024.

Developing a Recognition System for Diagnosing Melanoma Skin Lesions Using Artificial Intelligence Algorithms.

Comput Math Methods Med. 2021 May 15;2021:9998379. doi: 10.1155/2021/9998379. eCollection 2021.

Computer-assisted diagnosis techniques (dermoscopy and spectroscopy-based) for diagnosing skin cancer in adults.

Cochrane Database Syst Rev. 2018 Dec 4;12(12):CD013186. doi: 10.1002/14651858.CD013186.

本文引用的文献

A Low-Cost High-Performance Data Augmentation for Deep Learning-Based Skin Lesion Classification.

BME Front. 2022 Apr 26;2022:9765307. doi: 10.34133/2022/9765307. eCollection 2022.

A survey on deep learning for skin lesion segmentation.

Med Image Anal. 2023 Aug;88:102863. doi: 10.1016/j.media.2023.102863. Epub 2023 Jun 9.

A survey, review, and future trends of skin lesion segmentation and classification.

Comput Biol Med. 2023 Mar;155:106624. doi: 10.1016/j.compbiomed.2023.106624. Epub 2023 Feb 1.

Skin lesion classification of dermoscopic images using machine learning and convolutional neural network.

Sci Rep. 2022 Oct 28;12(1):18134. doi: 10.1038/s41598-022-22644-9.

An interpretable CNN-based CAD system for skin lesion diagnosis.

Artif Intell Med. 2022 Oct;132:102370. doi: 10.1016/j.artmed.2022.102370. Epub 2022 Aug 1.

Dermoscopy practice guidelines for use in telemedicine.

NPJ Digit Med. 2022 Apr 27;5(1):55. doi: 10.1038/s41746-022-00587-9.

Global Burden of Cutaneous Melanoma in 2020 and Projections to 2040.

JAMA Dermatol. 2022 May 1;158(5):495-503. doi: 10.1001/jamadermatol.2022.0160.

Hair removal in dermoscopy images using variational autoencoders.

Skin Res Technol. 2022 May;28(3):445-454. doi: 10.1111/srt.13145. Epub 2022 Mar 7.

Automated Detection of Nonmelanoma Skin Cancer Based on Deep Convolutional Neural Network.

J Healthc Eng. 2022 Feb 10;2022:6952304. doi: 10.1155/2022/6952304. eCollection 2022.

Data augmentation using Generative Adversarial Networks (GANs) for GAN-based detection of Pneumonia and COVID-19 in chest X-ray images.

Inform Med Unlocked. 2021;27:100779. doi: 10.1016/j.imu.2021.100779. Epub 2021 Nov 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

利用生成对抗网络进行集成特征选择和表格数据增强，以提高皮肤黑色素瘤的识别和可解释性。

Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability.

作者信息

机构信息