Wen Fei, Guan Xin, Qu Hai-Xia, Jiang Xiang-Jun
Qingdao University, Medical College, Qingdao 266000, Shandong Province, China.
Department of Gastroenterology, Qingdao Municipal Hospital, Qingdao 266071, Shandong Province, China.
World J Gastrointest Oncol. 2023 Jul 15;15(7):1215-1226. doi: 10.4251/wjgo.v15.i7.1215.
Single-cell sequencing technology provides the capability to analyze changes in specific cell types during the progression of disease. However, previous single-cell sequencing studies on gastric cancer (GC) have largely focused on immune cells and stromal cells, and further elucidation is required regarding the alterations that occur in gastric epithelial cells during the development of GC.
To create a GC prediction model based on single-cell and bulk RNA sequencing (bulk RNA-seq) data.
In this study, we conducted a comprehensive analysis by integrating three single-cell RNA sequencing (scRNA-seq) datasets and ten bulk RNA-seq datasets. Our analysis mainly focused on determining cell proportions and identifying differentially expressed genes (DEGs). Specifically, we performed differential expression analysis among epithelial cells in GC tissues and normal gastric tissues (NAGs) and utilized both single-cell and bulk RNA-seq data to establish a prediction model for GC. We further validated the accuracy of the GC prediction model in bulk RNA-seq data. We also used Kaplan-Meier plots to verify the correlation between genes in the prediction model and the prognosis of GC.
By analyzing scRNA-seq data from a total of 70707 cells from GC tissue, NAG, and chronic gastric tissue, 10 cell types were identified, and DEGs in GC and normal epithelial cells were screened. After determining the DEGs in GC and normal gastric samples identified by bulk RNA-seq data, a GC predictive classifier was constructed using the Least absolute shrinkage and selection operator (LASSO) and random forest methods. The LASSO classifier showed good performance in both validation and model verification using The Cancer Genome Atlas and Genotype-Tissue Expression (GTEx) datasets [area under the curve (AUC)_min = 0.988, AUC_1se = 0.994], and the random forest model also achieved good results with the validation set (AUC = 0.92). Genes , , , , , , , , and were identified to have high importance values in multiple GC predictive models, and KM-PLOTTER analysis showed their relevance to GC prognosis, suggesting their potential for use in GC diagnosis and treatment.
A predictive classifier was established based on the analysis of RNA-seq data, and the genes in it are expected to serve as auxiliary markers in the clinical diagnosis of GC.
单细胞测序技术能够分析疾病进展过程中特定细胞类型的变化。然而,先前关于胃癌(GC)的单细胞测序研究主要集中在免疫细胞和基质细胞上,对于胃癌发生过程中胃上皮细胞发生的变化仍需进一步阐明。
基于单细胞和批量RNA测序(批量RNA-seq)数据创建胃癌预测模型。
在本研究中,我们通过整合三个单细胞RNA测序(scRNA-seq)数据集和十个批量RNA-seq数据集进行了全面分析。我们的分析主要集中在确定细胞比例和识别差异表达基因(DEG)。具体而言,我们对胃癌组织和正常胃组织(NAG)中的上皮细胞进行了差异表达分析,并利用单细胞和批量RNA-seq数据建立了胃癌预测模型。我们在批量RNA-seq数据中进一步验证了胃癌预测模型的准确性。我们还使用Kaplan-Meier图来验证预测模型中的基因与胃癌预后之间的相关性。
通过分析来自胃癌组织、正常胃组织和慢性胃组织的总共70707个细胞的scRNA-seq数据,鉴定出10种细胞类型,并筛选出胃癌和正常上皮细胞中的差异表达基因。在确定批量RNA-seq数据鉴定出的胃癌和正常胃样本中的差异表达基因后,使用最小绝对收缩和选择算子(LASSO)和随机森林方法构建了胃癌预测分类器。LASSO分类器在使用癌症基因组图谱和基因型-组织表达(GTEx)数据集进行的验证和模型验证中均表现出良好性能[曲线下面积(AUC)_min = 0.988,AUC_1se = 0.994],随机森林模型在验证集中也取得了良好结果(AUC = 0.92)。基因 、 、 、 、 、 、 、 和 在多个胃癌预测模型中被确定具有高重要性值,并且KM-PLOTTER分析显示它们与胃癌预后相关,表明它们在胃癌诊断和治疗中的应用潜力。
基于RNA-seq数据的分析建立了一个预测分类器,其中的基因有望作为胃癌临床诊断的辅助标志物。