使用基因表达和深度学习以及 KL 散度基因选择预测肺癌。

Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection.

机构信息

College of Public Health, Zhengzhou University, Zhengzhou, 450001, China.

出版信息

BMC Bioinformatics. 2022 May 12;23(1):175. doi: 10.1186/s12859-022-04689-9.

DOI:10.1186/s12859-022-04689-9

PMID:35549644

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9103042/

Abstract

BACKGROUND

Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot research direction in lung cancer diagnosis in recent years, which provide an effective way of early diagnosis for lung cancer. Thus, building a deep neural network model is of great significance for the early diagnosis of lung cancer. However, the main challenges in mining gene expression datasets are the curse of dimensionality and imbalanced data. The existing methods proposed by some researchers can't address the problems of high-dimensionality and imbalanced data, because of the overwhelming number of variables measured (genes) versus the small number of samples, which result in poor performance in early diagnosis for lung cancer.

METHOD

Given the disadvantages of gene expression data sets with small datasets, high-dimensionality and imbalanced data, this paper proposes a gene selection method based on KL divergence, which selects some genes with higher KL divergence as model features. Then build a deep neural network model using Focal Loss as loss function, at the same time, we use k-fold cross validation method to verify and select the best model, we set the value of k is five in this paper.

RESULT

The deep learning model method based on KL divergence gene selection proposed in this paper has an AUC of 0.99 on the validation set. The generalization performance of model is high.

CONCLUSION

The deep neural network model based on KL divergence gene selection proposed in this paper is proved to be an accurate and effective method for lung cancer prediction.

摘要

背景

肺癌是中国死亡率最高的癌症之一。随着高通量测序技术的快速发展和近年来深度学习方法的研究与应用，基于基因表达的深度神经网络已成为近年来肺癌诊断的一个热门研究方向，为肺癌的早期诊断提供了有效的方法。因此，构建深度神经网络模型对于肺癌的早期诊断具有重要意义。然而，挖掘基因表达数据集的主要挑战是维数灾难和数据不平衡。一些研究人员提出的现有方法不能解决高维数据和不平衡数据的问题，因为所测量的变量（基因）数量与样本数量相比过于庞大，从而导致肺癌早期诊断的性能较差。

方法

鉴于数据集小、高维数据和数据不平衡的缺点，本文提出了一种基于 KL 散度的基因选择方法，该方法选择一些具有较高 KL 散度的基因作为模型特征。然后使用焦点损失作为损失函数构建深度神经网络模型，同时，我们使用 k 折交叉验证方法进行验证和选择最佳模型，在本文中我们设置 k 的值为 5。

结果

本文提出的基于 KL 散度基因选择的深度学习模型方法在验证集上的 AUC 为 0.99。模型的泛化性能较高。

结论

本文提出的基于 KL 散度基因选择的深度神经网络模型被证明是一种准确有效的肺癌预测方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6598/9103042/c11006f57c92/12859_2022_4689_Fig1_HTML.jpg

相似文献

Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection.使用基因表达和深度学习以及 KL 散度基因选择预测肺癌。

BMC Bioinformatics. 2022 May 12;23(1):175. doi: 10.1186/s12859-022-04689-9.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Network-based drug sensitivity prediction.基于网络的药物敏感性预测。

BMC Med Genomics. 2020 Dec 28;13(Suppl 11):193. doi: 10.1186/s12920-020-00829-3.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用：以新生儿呼吸暂停预测为例的研究

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data.基于堆叠稀疏自动编码器的半监督深度学习方法在 RNA-seq 数据癌症预测中的应用。

Comput Methods Programs Biomed. 2018 Nov;166:99-105. doi: 10.1016/j.cmpb.2018.10.004. Epub 2018 Oct 5.

A survey on gene expression data analysis using deep learning methods for cancer diagnosis.一项关于使用深度学习方法进行癌症诊断的基因表达数据分析的调查。

Prog Biophys Mol Biol. 2023 Jan;177:1-13. doi: 10.1016/j.pbiomolbio.2022.08.004. Epub 2022 Aug 19.

A deep learning-based multi-model ensemble method for cancer prediction.基于深度学习的癌症预测多模型集成方法。

Comput Methods Programs Biomed. 2018 Jan;153:1-9. doi: 10.1016/j.cmpb.2017.09.005. Epub 2017 Sep 14.

Development and Validation of a Deep Learning Model for Non-Small Cell Lung Cancer Survival.深度学习模型在非小细胞肺癌生存预测中的建立与验证。

JAMA Netw Open. 2020 Jun 1;3(6):e205842. doi: 10.1001/jamanetworkopen.2020.5842.

Performance Analysis of Deep Learning Models for Binary Classification of Cancer Gene Expression Data.深度学习模型在癌症基因表达数据二分类中的性能分析。

J Healthc Eng. 2022 Mar 9;2022:1122536. doi: 10.1155/2022/1122536. eCollection 2022.

Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile.基于深度学习的基因表达数据特征的癌症谱分类。

J Healthc Eng. 2022 Jan 7;2022:4715998. doi: 10.1155/2022/4715998. eCollection 2022.

引用本文的文献

Margin weighted robust discriminant score for feature selection in imbalanced gene expression classification.用于不平衡基因表达分类中特征选择的边缘加权鲁棒判别分数

PLoS One. 2025 Jun 10;20(6):e0325147. doi: 10.1371/journal.pone.0325147. eCollection 2025.

High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier.通过Exo-LCClassifier对不平衡外泌体数据进行高精度肺癌亚型诊断。

Front Genet. 2025 Apr 30;16:1583081. doi: 10.3389/fgene.2025.1583081. eCollection 2025.

Classification of lung cancer severity using gene expression data based on deep learning.基于深度学习的基因表达数据对肺癌严重程度的分类

BMC Med Inform Decis Mak. 2025 May 14;25(1):184. doi: 10.1186/s12911-025-03011-w.

Breast Cancer Detection Using Convolutional Neural Networks: A Deep Learning-Based Approach.使用卷积神经网络进行乳腺癌检测：一种基于深度学习的方法。

Cureus. 2025 May 3;17(5):e83421. doi: 10.7759/cureus.83421. eCollection 2025 May.

Genetic feature selection algorithm as an efficient glioma grade classifier.遗传特征选择算法作为一种高效的胶质瘤分级分类器。

Sci Rep. 2025 May 3;15(1):15497. doi: 10.1038/s41598-024-83879-2.

Lung Cancer Biomarker Database (LCBD): a comprehensive and curated repository of lung cancer biomarkers.肺癌生物标志物数据库（LCBD）：一个全面且经过整理的肺癌生物标志物储存库。

BMC Cancer. 2025 Mar 15;25(1):478. doi: 10.1186/s12885-025-13883-w.

Cancer genetics and deep learning applications for diagnosis, prognosis, and categorization.癌症遗传学与深度学习在诊断、预后及分类中的应用。

J Biol Methods. 2024 Aug 9;11(3):e99010017. doi: 10.14440/jbm.2024.0016. eCollection 2024.

Research in the application of artificial intelligence to lung cancer diagnosis.人工智能在肺癌诊断中的应用研究。

Front Med (Lausanne). 2024 Jan 30;11:1343485. doi: 10.3389/fmed.2024.1343485. eCollection 2024.

Diagnostic Accuracy of Machine Learning AI Architectures in Detection and Classification of Lung Cancer: A Systematic Review.机器学习人工智能架构在肺癌检测与分类中的诊断准确性：一项系统综述。

Diagnostics (Basel). 2023 Jun 22;13(13):2145. doi: 10.3390/diagnostics13132145.

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.使用基因表达数据进行癌症分类的机器学习方法：综述

Bioengineering (Basel). 2023 Jan 28;10(2):173. doi: 10.3390/bioengineering10020173.

本文引用的文献

Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.《全球癌症统计数据 2020：全球 185 个国家和地区 36 种癌症的发病率和死亡率估计》。

CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.

The Application of Deep Learning in Cancer Prognosis Prediction.深度学习在癌症预后预测中的应用。

Cancers (Basel). 2020 Mar 5;12(3):603. doi: 10.3390/cancers12030603.

The Biology of Lung Cancer: Development of More Effective Methods for Prevention, Diagnosis, and Treatment.肺癌生物学：预防、诊断和治疗更有效的方法的发展。

Clin Chest Med. 2020 Mar;41(1):25-38. doi: 10.1016/j.ccm.2019.10.003.

Using Supervised Learning Methods for Gene Selection in RNA-Seq Case-Control Studies.在RNA测序病例对照研究中使用监督学习方法进行基因选择

Front Genet. 2018 Aug 3;9:297. doi: 10.3389/fgene.2018.00297. eCollection 2018.

Lung cancer prediction using machine learning and advanced imaging techniques.使用机器学习和先进成像技术进行肺癌预测。

Transl Lung Cancer Res. 2018 Jun;7(3):304-312. doi: 10.21037/tlcr.2018.05.15.

Focal Loss for Dense Object Detection.用于密集目标检测的焦散损失

IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.

Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.支持向量机（SVM）学习在癌症基因组学中的应用。

Cancer Genomics Proteomics. 2018 Jan-Feb;15(1):41-51. doi: 10.21873/cgp.20063.

A deep learning-based multi-model ensemble method for cancer prediction.基于深度学习的癌症预测多模型集成方法。

Comput Methods Programs Biomed. 2018 Jan;153:1-9. doi: 10.1016/j.cmpb.2017.09.005. Epub 2017 Sep 14.

Deep learning in neural networks: an overview.神经网络中的深度学习：综述。

Neural Netw. 2015 Jan;61:85-117. doi: 10.1016/j.neunet.2014.09.003. Epub 2014 Oct 13.

Hallmarks of cancer: the next generation.癌症的特征：下一代。

Cell. 2011 Mar 4;144(5):646-74. doi: 10.1016/j.cell.2011.02.013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用基因表达和深度学习以及 KL 散度基因选择预测肺癌。

Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection.

机构信息

出版信息

BACKGROUND

METHOD

RESULT

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献