Suppr超能文献

区分高危和非高危神经母细胞瘤患者的神经母细胞瘤特征基因:结合随机森林与人工神经网络的开发与验证

Feature Genes in Neuroblastoma Distinguishing High-Risk and Non-high-Risk Neuroblastoma Patients: Development and Validation Combining Random Forest With Artificial Neural Network.

作者信息

Yang Sha, Zeng Lingfeng, Jin Xin, Lin Huapeng, Song Jianning

机构信息

Department of Surgery, Children's Hospital of Chongqing Medical University, Chongqing, China.

Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China.

出版信息

Front Med (Lausanne). 2022 Jul 15;9:882348. doi: 10.3389/fmed.2022.882348. eCollection 2022.

Abstract

There is a significant difference in prognosis among different risk groups. Therefore, it is of great significance to correctly identify the risk grouping of children. Using the genomic data of neuroblastoma samples in public databases, we used GSE49710 as the training set data to calculate the feature genes of the high-risk group and non-high-risk group samples based on the random forest (RF) algorithm and artificial neural network (ANN) algorithm. The screening results of RF showed that EPS8L1, PLCD4, CHD5, NTRK1, and SLC22A4 were the feature differentially expressed genes (DEGs) of high-risk neuroblastoma. The prediction model based on gene expression data in this study showed high overall accuracy and precision in both the training set and the test set (AUC = 0.998 in GSE49710 and AUC = 0.858 in GSE73517). Kaplan-Meier plotter showed that the overall survival and progression-free survival of patients in the low-risk subgroup were significantly better than those in the high-risk subgroup [HR: 3.86 (95% CI: 2.44-6.10) and HR: 3.03 (95% CI: 2.03-4.52), respectively]. Our ANN-based model has better classification performance than the SVM-based model and XGboost-based model. Nevertheless, more convincing data sets and machine learning algorithms will be needed to build diagnostic models for individual organization types in the future.

摘要

不同风险组之间的预后存在显著差异。因此,正确识别儿童的风险分组具有重要意义。利用公共数据库中神经母细胞瘤样本的基因组数据,我们将GSE49710作为训练集数据,基于随机森林(RF)算法和人工神经网络(ANN)算法计算高危组和非高危组样本的特征基因。RF的筛选结果显示,EPS8L1、PLCD4、CHD5、NTRK1和SLC22A4是高危神经母细胞瘤的特征性差异表达基因(DEG)。本研究基于基因表达数据的预测模型在训练集和测试集中均显示出较高的总体准确性和精确性(GSE49710中AUC = 0.998,GSE73517中AUC = 0.858)。Kaplan-Meier绘图显示,低风险亚组患者的总生存期和无进展生存期显著优于高风险亚组[HR分别为:3.86(95%CI:2.44 - 6.10)和HR:3.03(95%CI:2.03 - 4.52)]。我们基于ANN的模型比基于支持向量机(SVM)的模型和基于极端梯度提升(XGboost)的模型具有更好的分类性能。然而,未来构建针对个体组织类型的诊断模型将需要更有说服力的数据集和机器学习算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/402d/9336509/f5a59f366a49/fmed-09-882348-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验