Suppr超能文献

合成数据生成方法改进了胰腺癌患者术后早期肿瘤复发的风险预测模型。

Synthetic data generation method improves risk prediction model for early tumor recurrence after surgery in patients with pancreatic cancer.

作者信息

Jeong HyeJeong, Lee Jeong-Moo, Kim Hyeong Seok, Chae Hochang, Yoon So Jeong, Shin Sang Hyun, Han In Woong, Heo Jin Seok, Min Ji Hye, Hyun Seung Hyup, Kim Hongbeom

机构信息

Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea.

Division of Hepatobiliary-Pancreatic Surgery, Department of Surgery, Daejeon Eulji University Medical Center, Eulji University School of Medicine, Daejeon, South Korea.

出版信息

Sci Rep. 2025 Aug 29;15(1):31885. doi: 10.1038/s41598-025-15800-4.

Abstract

Pancreatic cancer is aggressive with high recurrence rates, necessitating accurate prediction models for effective treatment planning, particularly for neoadjuvant chemotherapy or upfront surgery. This study explores the use of variational autoencoder (VAE)-generated synthetic data to predict early tumor recurrence (within six months) in pancreatic cancer patients who underwent upfront surgery. Preoperative data of 158 patients between January 2021 and December 2022 was analyzed, and machine learning models-including Logistic Regression, Random Forest (RF), Gradient Boosting Machine (GBM), and Deep Neural Networks (DNN)-were trained on both original and synthetic datasets. The VAE-generated dataset (n = 94) closely matched the original data (p > 0.05) and enhanced model performance, improving accuracy (GBM: 0.81 to 0.87; RF: 0.84 to 0.87) and sensitivity (GBM: 0.73 to 0.91; RF: 0.82 to 0.91). PET/CT-derived metabolic parameters were the strongest predictors, accounting for 54.7% of the model predictive power with maximum standardized uptake value (SUVmax) showing the highest importance (0.182, 95% CI: 0.165-0.199). This study demonstrates that synthetic data can significantly enhance predictive models for pancreatic cancer recurrence, especially in data-limited scenarios, offering a promising strategy for oncology prediction models.

摘要

胰腺癌侵袭性强,复发率高,因此需要准确的预测模型来制定有效的治疗方案,特别是对于新辅助化疗或直接手术。本研究探讨使用变分自编码器(VAE)生成的合成数据来预测接受直接手术的胰腺癌患者的早期肿瘤复发(六个月内)。分析了2021年1月至2022年12月期间158例患者的术前数据,并在原始数据集和合成数据集上训练了机器学习模型,包括逻辑回归、随机森林(RF)、梯度提升机(GBM)和深度神经网络(DNN)。VAE生成的数据集(n = 94)与原始数据紧密匹配(p > 0.05),并提高了模型性能,提高了准确率(GBM:从0.81提高到0.87;RF:从0.84提高到0.87)和灵敏度(GBM:从0.73提高到0.91;RF:从0.82提高到0.91)。PET/CT衍生的代谢参数是最强的预测因子,占模型预测能力的54.7%,最大标准化摄取值(SUVmax)显示出最高的重要性(0.182,95%CI:0.165 - 0.199)。本研究表明,合成数据可以显著增强胰腺癌复发的预测模型,特别是在数据有限的情况下,为肿瘤学预测模型提供了一种有前景的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fbd/12397232/8fd0f02aa411/41598_2025_15800_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验