School of Pharmacy, College of Pharmacy, Taipei Medical University, Taipei City, Taiwan.
Clinical Data Center, Office of Data Science, Taipei Medical University, Taipei City, Taiwan.
Cancer Sci. 2023 Oct;114(10):4063-4072. doi: 10.1111/cas.15917. Epub 2023 Jul 25.
The study used clinical data to develop a prediction model for breast cancer survival. Breast cancer prognostic factors were explored using machine learning techniques. We conducted a retrospective study using data from the Taipei Medical University Clinical Research Database, which contains electronic medical records from three affiliated hospitals in Taiwan. The study included female patients aged over 20 years who were diagnosed with primary breast cancer and had medical records in hospitals between January 1, 2009 and December 31, 2020. The data were divided into training and external testing datasets. Nine different machine learning algorithms were applied to develop the models. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. A total of 3914 patients were included in the study. The highest AUC of 0.95 was observed with the artificial neural network model (accuracy, 0.90; sensitivity, 0.71; specificity, 0.73; PPV, 0.28; NPV, 0.94; and F1-score, 0.37). Other models showed relatively high AUC, ranging from 0.75 to 0.83. According to the optimal model results, cancer stage, tumor size, diagnosis age, surgery, and body mass index were the most critical factors for predicting breast cancer survival. The study successfully established accurate 5-year survival predictive models for breast cancer. Furthermore, the study found key factors that could affect breast cancer survival in Taiwanese women. Its results might be used as a reference for the clinical practice of breast cancer treatment.
本研究使用临床数据开发了一种用于预测乳腺癌生存的模型。使用机器学习技术探索了乳腺癌预后因素。我们进行了一项回顾性研究,使用的数据来自台北医学大学临床研究数据库,其中包含来自台湾三家附属医院的电子病历。该研究包括年龄在 20 岁以上的女性患者,她们被诊断患有原发性乳腺癌,并且在医院中有病历记录,研究时间范围为 2009 年 1 月 1 日至 2020 年 12 月 31 日。数据分为训练数据集和外部测试数据集。应用了九种不同的机器学习算法来开发模型。使用接收者操作特征曲线下的面积(AUC)、准确性、敏感度、特异性、阳性预测值(PPV)、阴性预测值(NPV)和 F1 分数来衡量算法的性能。共有 3914 名患者纳入本研究。人工神经网络模型的 AUC 最高,为 0.95(准确性为 0.90;敏感度为 0.71;特异性为 0.73;PPV 为 0.28;NPV 为 0.94;F1 分数为 0.37)。其他模型的 AUC 也相对较高,范围在 0.75 到 0.83 之间。根据最佳模型结果,癌症分期、肿瘤大小、诊断年龄、手术和体重指数是预测乳腺癌生存的最关键因素。本研究成功建立了用于预测乳腺癌 5 年生存的准确模型。此外,本研究发现了影响台湾女性乳腺癌生存的关键因素。其结果可能被用于为台湾的乳腺癌治疗临床实践提供参考。