Lu Di, Jiang Jianjun, Liu Xiguang, Wang He, Feng Siyang, Shi Xiaoshun, Wang Zhizhi, Chen Zhiming, Yan Xuebin, Wu Hua, Cai Kaican
Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, China.
Department of Thoracic Surgery, Peking University Shenzhen Hospital, Shenzhen, China.
Front Genet. 2020 Dec 21;11:614823. doi: 10.3389/fgene.2020.614823. eCollection 2020.
Metastatic cervical carcinoma from unknown primary (MCCUP) accounts for 1-4% of all head and neck tumors, and identifying the primary site in MCCUP is challenging. The most common histopathological type of MCCUP is squamous cell carcinoma (SCC), and it remains difficult to identify the primary site pathologically. Therefore, it seems necessary and urgent to develop novel and effective methods to determine the primary site in MCCUP. In the present study, the RNA sequencing data of four types of SCC and Pan-Cancer from the cancer genome atlas (TCGA) were obtained. And after data pre-processing, their differentially expressed genes (DEGs) were identified, respectively. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis indicated that these significantly changed genes of four types of SCC share lots of similar molecular functions and histological features. Then three machine learning models, [Random Forest (RF), support vector machine (SVM), and neural network (NN)] which consisted of ten genes to distinguish these four types of SCC were developed. Among the three models with prediction tests, the RF model worked best in the external validation set, with an overall predictive accuracy of 88.2%, sensitivity of 88.71%, and specificity of 95.42%. The NN model is the second in efficacy, with an overall accuracy of 82.02%, sensitivity of 81.23%, and specificity of 93.04%. The SVM model is the last, with an overall accuracy of 76.69%, sensitivity of 74.81%, and specificity of 90.84%. The present analysis of similarities and differences among the four types of SCC, and novel models developments for distinguishing four types of SCC with informatics methods shed lights on precision MCCUP diagnosis in the future.
原发灶不明的转移性宫颈癌(MCCUP)占所有头颈肿瘤的1%-4%,确定MCCUP的原发部位具有挑战性。MCCUP最常见的组织病理学类型是鳞状细胞癌(SCC),从病理学上确定原发部位仍然困难。因此,开发新的有效方法来确定MCCUP的原发部位似乎是必要且紧迫的。在本研究中,获取了癌症基因组图谱(TCGA)中四种SCC和泛癌的RNA测序数据。经过数据预处理后,分别鉴定了它们的差异表达基因(DEG)。基因本体(GO)和京都基因与基因组百科全书(KEGG)通路分析表明,这四种SCC的这些显著变化的基因具有许多相似的分子功能和组织学特征。然后开发了由十个基因组成的三种机器学习模型[随机森林(RF)、支持向量机(SVM)和神经网络(NN)]来区分这四种SCC。在进行预测测试的三种模型中,RF模型在外部验证集中表现最佳,总体预测准确率为88.2%,灵敏度为88.71%,特异性为95.42%。NN模型的效果次之,总体准确率为82.02%,灵敏度为81.23%,特异性为93.04%。SVM模型排在最后,总体准确率为76.69%,灵敏度为74.81%,特异性为90.84%。目前对四种SCC异同的分析以及用信息学方法区分四种SCC的新模型开发为未来精准的MCCUP诊断提供了思路。