Wang Shoufei, Liu Wenfei, Ye Ziheng, Xia Xiaotian, Guo Minggao
Department of Thyroid, Parathyroid, Breast, and Hernia Surgery, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Front Genet. 2022 Oct 7;13:957718. doi: 10.3389/fgene.2022.957718. eCollection 2022.
Papillary thyroid carcinoma (PTC) accounts for 80% of thyroid malignancy, and the occurrence of PTC is increasing rapidly. The present study was conducted with the purpose of identifying novel and important gene panels and developing an early diagnostic model for PTC by combining artificial neural network (ANN) and random forest (RF). Samples were searched from the Gene Expression Omnibus (GEO) database, and gene expression datasets (GSE27155, GSE60542, and GSE33630) were collected and processed. GSE27155 and GSE60542 were merged into the training set, and GSE33630 was defined as the validation set. Differentially expressed genes (DEGs) in the training set were obtained by "limma" of R software. Then, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis as well as immune cell infiltration analysis were conducted based on DEGs. Important genes were identified from the DEGs by random forest. Finally, an artificial neural network was used to develop a diagnostic model. Also, the diagnostic model was validated by the validation set, and the area under the receiver operating characteristic curve (AUC) value was satisfactory. A diagnostic model was established by a joint of random forest and artificial neural network based on a novel gene panel. The AUC showed that the diagnostic model had significantly excellent performance.
甲状腺乳头状癌(PTC)占甲状腺恶性肿瘤的80%,且PTC的发病率正在迅速上升。本研究旨在通过结合人工神经网络(ANN)和随机森林(RF)来识别新的重要基因面板,并开发一种PTC的早期诊断模型。从基因表达综合数据库(GEO)中搜索样本,并收集和处理基因表达数据集(GSE27155、GSE60542和GSE33630)。将GSE27155和GSE60542合并为训练集,将GSE33630定义为验证集。通过R软件的“limma”获得训练集中的差异表达基因(DEG)。然后,基于DEG进行基因本体论(GO)和京都基因与基因组百科全书(KEGG)富集分析以及免疫细胞浸润分析。通过随机森林从DEG中识别重要基因。最后,使用人工神经网络开发诊断模型。此外,通过验证集对诊断模型进行验证,其受试者工作特征曲线(AUC)下面积值令人满意。基于新的基因面板,通过随机森林和人工神经网络联合建立了诊断模型。AUC表明该诊断模型具有显著优异的性能。