Suppr超能文献

基于支持向量机的吸烟诱导肺癌特征风险通路的鉴定。

Identification of feature risk pathways of smoking-induced lung cancer based on SVM.

机构信息

Department of Surgery, Anji Third People's Hospital, Zhejiang China.

Department of Radiology, Taizhou Municipal Hospital, Zhejiang China.

出版信息

PLoS One. 2020 Jun 4;15(6):e0233445. doi: 10.1371/journal.pone.0233445. eCollection 2020.

Abstract

OBJECTIVE

The present study aims to explore the role of smoking factors in the risk of lung cancer and screen the feature risk pathways of smoking-induced lung cancer.

METHODS

The expression profiles of the patient data from GEO database were standardized, and differentially expressed genes (DEGs) were analyzed by limma algorithm. Samples and genes were analyzed by Unsupervised hierarchical clustering method, while GO and KEGG enrichment analyses were performed on DEGs. The data of the protein-protein interaction (PPI) network were downloaded from the BioGrid and HPRD databases, and the DEGs were mapped into the PPI network to identify the interaction relationship. The enriched significant pathways were used to calculate the anomaly score and RFE method was used to optimize the feature sets. The model was trained using the support vector machine (SVM) and the predicted results were plotted into ROC curves. The AUC value was calculated to evaluate the predictive performance of the SVM model.

RESULTS

A total of 1923 DEGs were obtained, of which 826 were down-regulated and 1097 were up-regulated. Unsupervised hierarchical clustering analysis showed that the diagnosis accuracy of lung cancer smokers was 74%, and that of non-lung cancer smokers was 75%. Five optimal feature pathway sets were obtained by screening, the clinical diagnostic ability of which was detected by SVM model with the accuracy improved to 84%. The diagnostic accuracy was 90% after combining clinical information.

CONCLUSION

We verified that five signaling pathways combined with clinical information could be used as a feature risk pathway for identifying lung cancer smokers and non-lung cancer smokers and increased the diagnostic accuracy.

摘要

目的

本研究旨在探讨吸烟因素在肺癌风险中的作用,并筛选吸烟诱导肺癌的特征风险通路。

方法

对 GEO 数据库中患者数据的表达谱进行标准化,采用 limma 算法分析差异表达基因(DEGs)。通过无监督层次聚类方法对样本和基因进行分析,对 DEGs 进行 GO 和 KEGG 富集分析。从 BioGrid 和 HPRD 数据库下载蛋白质-蛋白质相互作用(PPI)网络的数据,将 DEGs 映射到 PPI 网络中以识别相互作用关系。富集显著通路,计算异常评分,并采用 RFE 方法对特征集进行优化。采用支持向量机(SVM)对模型进行训练,并将预测结果绘制为 ROC 曲线。计算 AUC 值以评估 SVM 模型的预测性能。

结果

共获得 1923 个 DEGs,其中 826 个下调,1097 个上调。无监督层次聚类分析显示,肺癌吸烟者的诊断准确率为 74%,非肺癌吸烟者的诊断准确率为 75%。通过筛选获得了五个最优特征通路集,SVM 模型检测其临床诊断能力,准确率提高到 84%。结合临床信息后,诊断准确率提高到 90%。

结论

验证了五个信号通路结合临床信息可作为鉴别肺癌吸烟者和非肺癌吸烟者的特征风险通路,提高了诊断准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e5/7272018/e80c64eb4651/pone.0233445.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验