通过核最大均值差异和信息熵识别肺癌基因标志物。

Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy.

机构信息

Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, 2007, NSW, Australia.

Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, 2007, NSW, Australia.

出版信息

BMC Med Genomics. 2019 Dec 20;12(Suppl 8):183. doi: 10.1186/s12920-019-0630-4.

DOI:10.1186/s12920-019-0630-4

PMID:31856830

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6923882/

Abstract

BACKGROUND

The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups.

METHODS

This paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method.

RESULTS

Compared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported.

CONCLUSION

The proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries.

摘要

背景

肺癌的早期诊断一直是临床实践中的一个关键问题，而将差异表达基因作为疾病标志物是一种很有前途的解决方案。然而，大多数现有的基因差异表达分析（DEA）方法存在两个主要缺点：首先，这些方法基于固定的统计假设，并不总是有效；其次，当对照组和实验组之间没有明显的表达水平差距时，这些方法无法确定某个表达水平的边界。

方法

本文提出了一种用于识别肺癌标志物基因和基因表达水平边界的新方法。通过计算核最大均值差异，我们的方法可以评估正常、肿瘤附近正常（NAT）和肿瘤样本之间的表达差异。对于潜在的标志物基因，我们使用信息熵方法定义不同组之间的表达水平边界。

结果

与 t 检验和倍数变化两种传统方法相比，我们的方法在 10 倍交叉验证中，所有指标的平均排名最高的基因都能取得更好的性能。然后进行 GO 和 KEGG 富集分析，以探讨排名前 100 的基因的生物学功能。最后，我们选择排名前 10 的平均排名基因作为肺癌标志物，并计算和报告它们的表达边界。

结论

所提出的方法可有效识别肺癌诊断的基因标志物。它不仅比传统的 DEA 方法更准确，而且还提供了一种可靠的方法来确定基因表达水平的边界。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccf7/6923882/fd165da861f4/12920_2019_630_Fig1_HTML.jpg

相似文献

Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy.通过核最大均值差异和信息熵识别肺癌基因标志物。

BMC Med Genomics. 2019 Dec 20;12(Suppl 8):183. doi: 10.1186/s12920-019-0630-4.

A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data.基于 DNA 甲基化和基因表达数据的线性回归和深度学习方法在癌症中检测可靠的遗传改变。

Genes (Basel). 2020 Aug 12;11(8):931. doi: 10.3390/genes11080931.

Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems.利用一种针对多类问题的信噪比新推广方法从表达数据中发现显性和隐性基因。

BMC Bioinformatics. 2008 Oct 9;9:425. doi: 10.1186/1471-2105-9-425.

Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.基于最大相关最小冗余特征选择的多组学数据表观遗传生物标志物识别

IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.

Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms.基于机器学习算法的肺癌亚型基因表达谱分析。

Biochim Biophys Acta Mol Basis Dis. 2020 Aug 1;1866(8):165822. doi: 10.1016/j.bbadis.2020.165822. Epub 2020 Apr 28.

A PSO-Based Approach for Pathway Marker Identification From Gene Expression Data.一种基于粒子群优化算法从基因表达数据中识别通路标志物的方法。

IEEE Trans Nanobioscience. 2015 Sep;14(6):591-7. doi: 10.1109/TNB.2015.2425471. Epub 2015 Apr 29.

Discovering pathway biomarkers of hepatocellular carcinoma occurrence and development by dynamic network entropy analysis.通过动态网络熵分析发现肝细胞癌发生发展的通路生物标志物。

Gene. 2023 Jul 15;873:147467. doi: 10.1016/j.gene.2023.147467. Epub 2023 May 8.

Computational identification of biomarker genes for lung cancer considering treatment and non-treatment studies.考虑治疗和非治疗研究的肺癌生物标志物基因的计算识别。

BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):218. doi: 10.1186/s12859-020-3524-8.

Microarray analysis of the expression profile of immune-related gene in rapid recurrence early-stage lung adenocarcinoma.免疫相关基因在快速复发早期肺腺癌中表达谱的基因芯片分析。

J Cancer Res Clin Oncol. 2020 Sep;146(9):2299-2310. doi: 10.1007/s00432-020-03287-7. Epub 2020 Jun 18.

Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data.利用大规模基因数据进行管道设计，以识别关键特征并对肺癌患者的化疗反应进行分类。

BMC Syst Biol. 2018 Nov 20;12(Suppl 5):97. doi: 10.1186/s12918-018-0615-5.

引用本文的文献

Enhanced Lung Cancer Survival Prediction Using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets.使用半监督伪标记和从多样的PET/CT数据集中学习来增强肺癌生存预测

Cancers (Basel). 2025 Jan 17;17(2):285. doi: 10.3390/cancers17020285.

Distribution-based detection of radiographic changes in pneumonia patterns: A COVID-19 case study.基于分布的肺炎模式影像学变化检测：一项COVID-19病例研究。

Heliyon. 2024 Aug 5;10(16):e35677. doi: 10.1016/j.heliyon.2024.e35677. eCollection 2024 Aug 30.

本文引用的文献

Current and Prospective Protein Biomarkers of Lung Cancer.肺癌的当前及潜在蛋白质生物标志物

Cancers (Basel). 2017 Nov 13;9(11):155. doi: 10.3390/cancers9110155.

Comprehensive analysis of normal adjacent to tumor transcriptomes.肿瘤相邻正常组织转录组的综合分析

Nat Commun. 2017 Oct 20;8(1):1077. doi: 10.1038/s41467-017-01027-z.

Update on biomarkers for the detection of lung cancer.肺癌检测生物标志物的最新进展。

Lung Cancer (Auckl). 2012 Jun 11;3:21-29. doi: 10.2147/LCTT.S23424. eCollection 2012.

Inferring differentially expressed pathways using kernel maximum mean discrepancy-based test.使用基于核最大均值差异检验推断差异表达通路。

BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):205. doi: 10.1186/s12859-016-1046-1.

[Pulmonary neuroendocrine tumors in the new WHO 2015 classification: Start of breaking new grounds?].[2015年世界卫生组织新分类中的肺神经内分泌肿瘤：开拓新领域的开端？]

Pathologe. 2015 May;36(3):283-92. doi: 10.1007/s00292-015-0030-2.

Performance of a multiplexed dual analyte immunoassay for the early detection of non-small cell lung cancer.用于非小细胞肺癌早期检测的多重双分析物免疫测定法的性能

J Transl Med. 2015 Feb 12;13:55. doi: 10.1186/s12967-015-0419-y.

miR-Test: a blood test for lung cancer early detection.miR-Test：一种用于肺癌早期检测的血液检测方法。

J Natl Cancer Inst. 2015 Mar 19;107(6):djv063. doi: 10.1093/jnci/djv063. Print 2015 Jun.

The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge.癌症基因组图谱（TCGA）：一个不可估量的知识来源。

Contemp Oncol (Pozn). 2015;19(1A):A68-77. doi: 10.5114/wo.2014.47136.

Clinical utility of a plasma-based miRNA signature classifier within computed tomography lung cancer screening: a correlative MILD trial study.基于血浆的 miRNA 特征分类器在计算机断层扫描肺癌筛查中的临床效用：一项相关的 MILD 试验研究。

J Clin Oncol. 2014 Mar 10;32(8):768-73. doi: 10.1200/JCO.2013.50.4357. Epub 2014 Jan 13.

Comparison of software packages for detecting differential expression in RNA-seq studies.RNA测序研究中用于检测差异表达的软件包比较。

Brief Bioinform. 2015 Jan;16(1):59-70. doi: 10.1093/bib/bbt086. Epub 2013 Dec 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过核最大均值差异和信息熵识别肺癌基因标志物。

Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献