Girgis Andrew G, Galoaa Bishoy M, Gonzalez Marcos R, Lozano-Calderon Santiago A
Orthopaedic Oncology Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA.
Harvard Medical School, Boston, MA, USA.
Ann Surg Oncol. 2025 Sep 8. doi: 10.1245/s10434-025-18249-x.
Undifferentiated pleomorphic sarcoma (UPS) is a prevalent soft tissue sarcoma subtype associated with poor prognosis. Current prognostic tools lack the ability to incorporate personalized data for predicting survival. Machine learning (ML) offers a potential solution to enhance survival prediction accuracy. This study aimed to develop and validate a new ML algorithm to predict 2- and 5-year overall survival (OS) in patients with UPS.
We included 3494 patients with a histologic diagnosis of UPS from the Surveillance, Epidemiology, and End Results (SEER) database for model training and internal validation. An institutional database of 288 patients was used for external validation. The development of the ML model involved converting tabular patient data into high-dimensional embeddings using a pre-trained language model. A custom neural network, optimized for high-dimensional data, was then developed to classify survival outcomes. Area under the curve (AUC), precision, and F1-scores were used to assess model performance.
Tumor size, age, metastases, lymph node involvement, and sex were factors associated with OS. On internal validation, our model showed higher performance than standard ML models for both 2-year and 5-year OS (AUC of 0.81 and 0.82, respectively). On external validation, the model showed excellent discriminative performance for the 2-year (AUC = 0.79) and 5-year OS (AUC = 0.81). In addition, we showed that our developed model performed superiorly compared with other models.
We successfully developed and validated an ML algorithm that accurately predicts 2-year and 5-year OS in patients with UPS. To confirm generalizability, further external validation of this algorithm is encouraged.
未分化多形性肉瘤(UPS)是一种常见的软组织肉瘤亚型,预后较差。目前的预后工具缺乏纳入个性化数据以预测生存的能力。机器学习(ML)为提高生存预测准确性提供了一种潜在的解决方案。本研究旨在开发并验证一种新的ML算法,以预测UPS患者的2年和5年总生存期(OS)。
我们纳入了来自监测、流行病学和最终结果(SEER)数据库的3494例经组织学诊断为UPS的患者进行模型训练和内部验证。使用一个包含288例患者的机构数据库进行外部验证。ML模型的开发包括使用预训练语言模型将表格形式的患者数据转换为高维嵌入。然后开发一个针对高维数据进行优化的定制神经网络,以对生存结果进行分类。曲线下面积(AUC)、精确率和F1分数用于评估模型性能。
肿瘤大小、年龄、转移、淋巴结受累和性别是与OS相关的因素。在内部验证中,我们的模型在2年和5年OS方面均表现出比标准ML模型更高的性能(AUC分别为0.81和0.82)。在外部验证中,该模型在2年(AUC = 0.79)和5年OS(AUC = 0.81)方面表现出出色的判别性能。此外,我们表明我们开发的模型与其他模型相比表现更优。
我们成功开发并验证了一种ML算法,该算法能准确预测UPS患者的2年和5年OS。为确认其通用性,鼓励对该算法进行进一步的外部验证。