Liu Ke, Shen Liu-Qing, Zhang Dian-Bao, Kang Yi-Xin, Wang Yi-Xuan, Chen Pan, Zhang Ran, Gu Bian-Li, Jiao Ye-Lin, Yuan Xiang, Qi Yi-Jun, Gao She-Gan
Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medicine) of Henan University of Science and Technology, Luoyang, China.
School of Information Engineering, Henan University of Science and Technology, Luoyang, China.
J Thorac Dis. 2023 Sep 28;15(9):4938-4948. doi: 10.21037/jtd-23-1058. Epub 2023 Sep 25.
In view of the low accuracy of the prognosis model of esophageal squamous cell carcinoma (ESCC), this study aimed to optimize the least squares support vector machine (LSSVM) algorithm to determine the uncertain prognostic factors using a Cloud model, and consequently, to establish a new high-precision prognosis model of ESCC.
We studied 4,771 ESCC patients(training samples) from the Surveillance, Epidemiology, and End Results (SEER) database and 635 ESCC patients(validation samples) from the Henan Provincial Center for Disease Control and Prevention (HCDC) database, with the same exclusion criteria and inclusion criteria for both databases, and obtained permission to obtain a research data file in the SEER database from the National Cancer Institute. The independent risk factors were analyzed using the log-rank method, survival curves, univariate and multivariate Cox analysis. Finally, the independent prognostic factors were used to construct the nomogram, random forest and Cloud-LSSVM prognostic models were utilized for validation.
The overall median survival time of the SEER database was 14 months (HCDC samples was 46 months), the mean survival time was 26.5 months (HCDC samples was 36.8 months), and the 3-year survival rate was 65.8%. This is because most of the patients with Henan samples are early ESCC, and most of the Seer patients are T3 and T4 people. The multivariate Cox analysis showed that age at diagnosis (P<0.001), sex (P=0.001), race (P=0.002), differentiation grade (P<0.001), pathologic T category (P<0.001), and pathologic M category (P<0.001) were the factors affecting the prognosis of ESCC patients. The SEER data and HCDC database results showed that the accuracy of the Cloud-LSSVM (C-index =0.71, 0.689) model is higher than the differentiation grade (C-index =0.548, 0.506), random forest (C-index =0.649, 0.498), and nomogram (C-index =0.659, 0.563). This new model can realize the unity of the randomness and fuzziness of the Cloud model and utilize the powerful learning and non-linear mapping abilities of LSSVM.
Due to the difference of clans between training samples and test samples, the accuracy of prediction is generally not high, but the accuracy of Cloud-LSSVM model is much higher than other models. The new model provides a clear prognostic superiority over the random forest, nomogram, and other models.
鉴于食管鳞状细胞癌(ESCC)预后模型的准确性较低,本研究旨在优化最小二乘支持向量机(LSSVM)算法,使用云模型确定不确定的预后因素,从而建立一种新的高精度ESCC预后模型。
我们研究了来自监测、流行病学和最终结果(SEER)数据库的4771例ESCC患者(训练样本)以及来自河南省疾病预防控制中心(HCDC)数据库的635例ESCC患者(验证样本),两个数据库具有相同的排除标准和纳入标准,并获得了美国国立癌症研究所的许可以获取SEER数据库中的研究数据文件。使用对数秩检验、生存曲线、单因素和多因素Cox分析来分析独立危险因素。最后,使用独立预后因素构建列线图,利用随机森林和云-LSSVM预后模型进行验证。
SEER数据库的总体中位生存时间为14个月(HCDC样本为46个月),平均生存时间为26.5个月(HCDC样本为36.8个月),3年生存率为65.8%。这是因为河南样本中的大多数患者为早期ESCC,而Seer患者中的大多数为T3和T4期患者。多因素Cox分析显示,诊断时年龄(P<0.001)、性别(P=0.001)、种族(P=0.002)、分化程度(P<0.001)、病理T分期(P<0.001)和病理M分期(P<0.001)是影响ESCC患者预后的因素。SEER数据和HCDC数据库结果表明,云-LSSVM(C指数=0.71,0.689)模型的准确性高于分化程度(C指数=0.548,0.506)、随机森林(C指数=0.649,0.498)和列线图(C指数=0.659,0.563)。这种新模型可以实现云模型随机性和模糊性的统一,并利用LSSVM强大的学习和非线性映射能力。
由于训练样本和测试样本之间存在族群差异,预测准确性一般不高,但云-LSSVM模型的准确性远高于其他模型。新模型相对于随机森林、列线图和其他模型具有明显的预后优势。