基于F-FDG PET的放射组学模型中机器学习重采样技术对不平衡数据集的处理对头颈癌患者队列预后性能的影响。

Effect of machine learning re-sampling techniques for imbalanced datasets in F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients.

作者信息

Xie Chenyi, Du Richard, Ho Joshua Wk, Pang Herbert H, Chiu Keith Wh, Lee Elaine Yp, Vardhanabhuti Varut

机构信息

Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong SAR, China.

School of Biomedical Science, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.

出版信息

Eur J Nucl Med Mol Imaging. 2020 Nov;47(12):2826-2835. doi: 10.1007/s00259-020-04756-4. Epub 2020 Apr 6.

DOI:10.1007/s00259-020-04756-4

PMID:32253486

Abstract

PURPOSE

Biomedical data frequently contain imbalance characteristics which make achieving good predictive performance with data-driven machine learning approaches a challenging task. In this study, we investigated the impact of re-sampling techniques for imbalanced datasets in PET radiomics-based prognostication model in head and neck (HNC) cancer patients.

METHODS

Radiomics analysis was performed in two cohorts of patients, including 166 patients newly diagnosed with nasopharyngeal carcinoma (NPC) in our centre and 182 HNC patients from open database. Conventional PET parameters and robust radiomics features were extracted for correlation analysis of the overall survival (OS) and disease progression-free survival (DFS). We investigated a cross-combination of 10 re-sampling methods (oversampling, undersampling, and hybrid sampling) with 4 machine learning classifiers for survival prediction. Diagnostic performance was assessed in hold-out test sets. Statistical differences were analysed using Monte Carlo cross-validations by post hoc Nemenyi analysis.

RESULTS

Oversampling techniques like ADASYN and SMOTE could improve prediction performance in terms of G-mean and F-measures in minority class, without significant loss of F-measures in majority class. We identified optimal PET radiomics-based prediction model of OS (AUC of 0.82, G-mean of 0.77) for our NPC cohort. Similar findings that oversampling techniques improved the prediction performance were seen when this was tested on an external dataset indicating generalisability.

CONCLUSION

Our study showed a significant positive impact on the prediction performance in imbalanced datasets by applying re-sampling techniques. We have created an open-source solution for automated calculations and comparisons of multiple re-sampling techniques and machine learning classifiers for easy replication in future studies.

摘要

目的

生物医学数据常常具有不平衡特征，这使得使用数据驱动的机器学习方法实现良好的预测性能成为一项具有挑战性的任务。在本研究中，我们调查了重采样技术对基于PET影像组学的头颈（HNC）癌患者预后模型中不平衡数据集的影响。

方法

对两组患者进行了影像组学分析，包括我们中心新诊断的166例鼻咽癌（NPC）患者和来自开放数据库的182例HNC患者。提取常规PET参数和稳健的影像组学特征，用于总生存期（OS）和无疾病进展生存期（DFS）的相关性分析。我们研究了10种重采样方法（过采样、欠采样和混合采样）与4种机器学习分类器的交叉组合用于生存预测。在留出测试集中评估诊断性能。使用事后Nemenyi分析的蒙特卡罗交叉验证分析统计差异。

结果

像ADASYN和SMOTE这样的过采样技术可以在少数类别的G均值和F度量方面提高预测性能，而多数类别的F度量不会有显著损失。我们为我们的NPC队列确定了基于PET影像组学的最佳OS预测模型（AUC为0.82，G均值为0.77）。在外部数据集上进行测试时也发现了过采样技术提高预测性能的类似结果，表明具有通用性。

结论

我们的研究表明，应用重采样技术对不平衡数据集中的预测性能有显著的积极影响。我们创建了一个开源解决方案，用于自动计算和比较多种重采样技术和机器学习分类器，以便在未来的研究中易于复制。

相似文献

Effect of machine learning re-sampling techniques for imbalanced datasets in F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients.基于F-FDG PET的放射组学模型中机器学习重采样技术对不平衡数据集的处理对头颈癌患者队列预后性能的影响。

Eur J Nucl Med Mol Imaging. 2020 Nov;47(12):2826-2835. doi: 10.1007/s00259-020-04756-4. Epub 2020 Apr 6.

Imbalanced Data Correction Based PET/CT Radiomics Model for Predicting Lymph Node Metastasis in Clinical Stage T1 Lung Adenocarcinoma.基于不平衡数据校正的PET/CT影像组学模型预测临床T1期肺腺癌淋巴结转移

Front Oncol. 2022 Jan 28;12:788968. doi: 10.3389/fonc.2022.788968. eCollection 2022.

Machine learning predictive performance evaluation of conventional and fuzzy radiomics in clinical cancer imaging cohorts.机器学习在临床癌症成像队列中对常规和模糊放射组学的预测性能评估。

Eur J Nucl Med Mol Imaging. 2023 May;50(6):1607-1620. doi: 10.1007/s00259-023-06127-1. Epub 2023 Feb 4.

Stacking Ensemble Learning-Based [F]FDG PET Radiomics for Outcome Prediction in Diffuse Large B-Cell Lymphoma.基于堆叠集成学习的 [F]FDG PET 放射组学在弥漫性大 B 细胞淋巴瘤预后预测中的应用。

J Nucl Med. 2023 Oct;64(10):1603-1609. doi: 10.2967/jnumed.122.265244. Epub 2023 Jul 27.

Radiomics analysis for the differentiation of autoimmune pancreatitis and pancreatic ductal adenocarcinoma in F-FDG PET/CT.基于 F-FDG PET/CT 的影像组学分析鉴别自身免疫性胰腺炎和胰腺导管腺癌。

Med Phys. 2019 Oct;46(10):4520-4530. doi: 10.1002/mp.13733. Epub 2019 Aug 13.

Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy.机器学习中不平衡数据集的重采样技术比较：在局灶性癫痫患者发作间期颅内脑电图记录的致痫区定位中的应用

Front Neuroinform. 2021 Nov 19;15:715421. doi: 10.3389/fninf.2021.715421. eCollection 2021.

Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data.基于体模和临床图像数据的3D FDG PET分割的多中心质量与变异性分析

Med Phys. 2017 Feb;44(2):479-496. doi: 10.1002/mp.12041.

Radiomics-based prediction of recurrence for head and neck cancer patients using data imbalanced correction.基于放射组学的数据不平衡校正对头颈部癌症患者复发的预测

Comput Biol Med. 2024 Sep;180:108879. doi: 10.1016/j.compbiomed.2024.108879. Epub 2024 Jul 26.

Neck Lymph Node Recurrence in HNC Patients Might Be Predicted before Radiotherapy Using Radiomics Extracted from CT Images and XGBoost Algorithm.头颈部癌症患者的颈部淋巴结复发情况或许可在放疗前利用从CT图像中提取的影像组学特征及XGBoost算法进行预测。

J Pers Med. 2022 Aug 25;12(9):1377. doi: 10.3390/jpm12091377.

Development and validation of an F-FDG PET radiomic model for prognosis prediction in patients with nasal-type extranodal natural killer/T cell lymphoma.建立并验证 ^18F-FDG PET 影像组学模型预测鼻型结外自然杀伤/T 细胞淋巴瘤患者预后

Eur Radiol. 2020 Oct;30(10):5578-5587. doi: 10.1007/s00330-020-06943-1. Epub 2020 May 20.

引用本文的文献

Developing a multi-modal MRI radiomics-based model to predict the long-term overall survival of patients with hypopharyngeal cancer receiving definitive radiotherapy.开发一种基于多模态MRI影像组学的模型，以预测接受根治性放疗的下咽癌患者的长期总生存率。

World J Otorhinolaryngol Head Neck Surg. 2025 Mar 24;11(3):440-448. doi: 10.1002/wjo2.70001. eCollection 2025 Sep.

Impact of harmonization and oversampling methods on radiomics analysis of multi-center imbalanced datasets: application to PET-based prediction of lung cancer subtypes.标准化和过采样方法对多中心不均衡数据集的影像组学分析的影响：在基于PET的肺癌亚型预测中的应用

EJNMMI Phys. 2025 Apr 7;12(1):34. doi: 10.1186/s40658-025-00750-7.

The Application of Machine Learning in Predicting the Permeability of Drugs Across the Blood Brain Barrier.机器学习在预测药物透过血脑屏障通透性中的应用

Iran J Pharm Res. 2024 Nov 24;23(1):e149367. doi: 10.5812/ijpr-149367. eCollection 2024 Jan-Dec.

Clinician-driven automated data preprocessing in nuclear medicine AI environments.核医学人工智能环境中临床医生驱动的自动数据预处理

Eur J Nucl Med Mol Imaging. 2025 Mar 7. doi: 10.1007/s00259-025-07183-5.

The current landscape of artificial intelligence in oral and maxillofacial surgery- a narrative review.口腔颌面外科人工智能的现状——一篇叙述性综述

Oral Maxillofac Surg. 2025 Jan 17;29(1):37. doi: 10.1007/s10006-025-01334-6.

Automatic knee osteoarthritis severity grading based on X-ray images using a hierarchical classification method.基于 X 射线图像的膝关节骨关节炎严重程度自动分级：一种分层分类方法。

Arthritis Res Ther. 2024 Nov 18;26(1):203. doi: 10.1186/s13075-024-03416-4.

Predicting the T790M mutation in non-small cell lung cancer (NSCLC) using brain metastasis MR radiomics: a study with an imbalanced dataset.利用脑转移磁共振影像组学预测非小细胞肺癌（NSCLC）中的T790M突变：一项针对不均衡数据集的研究

Discov Oncol. 2024 Sep 14;15(1):447. doi: 10.1007/s12672-024-01333-1.

Admission blood tests predicting survival of SARS-CoV-2 infected patients: a practical implementation of graph convolution network in imbalance dataset.入院血液检测预测 SARS-CoV-2 感染患者的生存：不平衡数据集图卷积网络的实际应用。

BMC Infect Dis. 2024 Aug 9;24(1):803. doi: 10.1186/s12879-024-09699-x.

Artificial Intelligence in Head and Neck Surgery.人工智能在头颈部外科中的应用。

Otolaryngol Clin North Am. 2024 Oct;57(5):803-820. doi: 10.1016/j.otc.2024.05.001. Epub 2024 Jun 22.

A Scalable Radiomics- and Natural Language Processing-Based Machine Learning Pipeline to Distinguish Between Painful and Painless Thoracic Spinal Bone Metastases: Retrospective Algorithm Development and Validation Study.一种基于可扩展的影像组学和自然语言处理的机器学习流程，用于区分疼痛性和无痛性胸椎骨转移：回顾性算法开发与验证研究

JMIR AI. 2023 May 22;2:e44779. doi: 10.2196/44779.

本文引用的文献

Prognostic Value of Deep Learning PET/CT-Based Radiomics: Potential Role for Future Individual Induction Chemotherapy in Advanced Nasopharyngeal Carcinoma.深度学习 PET/CT 影像组学的预后价值：在晚期鼻咽癌中未来个体化诱导化疗的潜在作用。

Clin Cancer Res. 2019 Jul 15;25(14):4271-4279. doi: 10.1158/1078-0432.CCR-18-3065. Epub 2019 Apr 11.

A comprehensive data level analysis for cancer diagnosis on imbalanced data.针对不平衡数据进行癌症诊断的全面数据级别分析。

J Biomed Inform. 2019 Feb;90:103089. doi: 10.1016/j.jbi.2018.12.003. Epub 2019 Jan 3.

Radiomics and machine learning may accurately predict the grade and histological subtype in meningiomas using conventional and diffusion tensor imaging.影像组学和机器学习可以使用常规和弥散张量成像准确预测脑膜瘤的分级和组织学亚型。

Eur Radiol. 2019 Aug;29(8):4068-4076. doi: 10.1007/s00330-018-5830-3. Epub 2018 Nov 15.

Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.全球癌症统计数据 2018：GLOBOCAN 对全球 185 个国家/地区 36 种癌症的发病率和死亡率的估计。

CA Cancer J Clin. 2018 Nov;68(6):394-424. doi: 10.3322/caac.21492. Epub 2018 Sep 12.

LIFEx: A Freeware for Radiomic Feature Calculation in Multimodality Imaging to Accelerate Advances in the Characterization of Tumor Heterogeneity.LIFEx：一种用于多模态成像中放射组学特征计算的免费软件，可加速肿瘤异质性特征描述的进展。

Cancer Res. 2018 Aug 15;78(16):4786-4789. doi: 10.1158/0008-5472.CAN-18-0125. Epub 2018 Jun 29.

Design and Selection of Machine Learning Methods Using Radiomics and Dosiomics for Normal Tissue Complication Probability Modeling of Xerostomia.使用放射组学和剂量组学进行机器学习方法的设计与选择，以建立口干症正常组织并发症概率模型

Front Oncol. 2018 Mar 5;8:35. doi: 10.3389/fonc.2018.00035. eCollection 2018.

Robustness versus disease differentiation when varying parameter settings in radiomics features: application to nasopharyngeal PET/CT.在放射组学特征中改变参数设置时的稳健性与疾病区分：在鼻咽部 PET/CT 中的应用。

Eur Radiol. 2018 Aug;28(8):3245-3254. doi: 10.1007/s00330-018-5343-0. Epub 2018 Mar 8.

Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer.头颈部肿瘤肿瘤失败风险评估的放射组学策略。

Sci Rep. 2017 Aug 31;7(1):10117. doi: 10.1038/s41598-017-10371-5.

Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer.基于放射组学的非小细胞肺癌预后分析。

Sci Rep. 2017 Apr 18;7:46349. doi: 10.1038/srep46349.

Revisiting the Robustness of PET-Based Textural Features in the Context of Multi-Centric Trials.在多中心试验背景下重新审视基于PET的纹理特征的稳健性。

PLoS One. 2016 Jul 28;11(7):e0159984. doi: 10.1371/journal.pone.0159984. eCollection 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于F-FDG PET的放射组学模型中机器学习重采样技术对不平衡数据集的处理对头颈癌患者队列预后性能的影响。

Effect of machine learning re-sampling techniques for imbalanced datasets in F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献