Jeong HyeJeong, Lee Jeong-Moo, Kim Hyeong Seok, Chae Hochang, Yoon So Jeong, Shin Sang Hyun, Han In Woong, Heo Jin Seok, Min Ji Hye, Hyun Seung Hyup, Kim Hongbeom
Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea.
Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Daejeon Eulji University Medical Center, Eulji University School of Medicine, Daejeon, South Korea.
Sci Rep. 2025 Aug 29;15(1):31885. doi: 10.1038/s41598-025-15800-4.
Pancreatic cancer is aggressive with high recurrence rates, necessitating accurate prediction models for effective treatment planning, particularly for neoadjuvant chemotherapy or upfront surgery. This study explores the use of variational autoencoder (VAE)-generated synthetic data to predict early tumor recurrence (within six months) in pancreatic cancer patients who underwent upfront surgery. Preoperative data of 158 patients between January 2021 and December 2022 was analyzed, and machine learning models-including Logistic Regression, Random Forest (RF), Gradient Boosting Machine (GBM), and Deep Neural Networks (DNN)-were trained on both original and synthetic datasets. The VAE-generated dataset (n = 94) closely matched the original data (p > 0.05) and enhanced model performance, improving accuracy (GBM: 0.81 to 0.87; RF: 0.84 to 0.87) and sensitivity (GBM: 0.73 to 0.91; RF: 0.82 to 0.91). PET/CT-derived metabolic parameters were the strongest predictors, accounting for 54.7% of the model predictive power with maximum standardized uptake value (SUVmax) showing the highest importance (0.182, 95% CI: 0.165-0.199). This study demonstrates that synthetic data can significantly enhance predictive models for pancreatic cancer recurrence, especially in data-limited scenarios, offering a promising strategy for oncology prediction models.
胰腺癌侵袭性强,复发率高,因此需要准确的预测模型来制定有效的治疗方案,特别是对于新辅助化疗或直接手术。本研究探讨使用变分自编码器(VAE)生成的合成数据来预测接受直接手术的胰腺癌患者的早期肿瘤复发(六个月内)。分析了2021年1月至2022年12月期间158例患者的术前数据,并在原始数据集和合成数据集上训练了机器学习模型,包括逻辑回归、随机森林(RF)、梯度提升机(GBM)和深度神经网络(DNN)。VAE生成的数据集(n = 94)与原始数据紧密匹配(p > 0.05),并提高了模型性能,提高了准确率(GBM:从0.81提高到0.87;RF:从0.84提高到0.87)和灵敏度(GBM:从0.73提高到0.91;RF:从0.82提高到0.91)。PET/CT衍生的代谢参数是最强的预测因子,占模型预测能力的54.7%,最大标准化摄取值(SUVmax)显示出最高的重要性(0.182,95%CI:0.165 - 0.199)。本研究表明,合成数据可以显著增强胰腺癌复发的预测模型,特别是在数据有限的情况下,为肿瘤学预测模型提供了一种有前景的策略。