采用逻辑回归、人工神经网络和决策树的 pooled cDNA 微阵列分析进行乳腺癌生存能力的基因表达谱分析。

Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees.

出版信息

BMC Bioinformatics. 2013 Mar 19;14:100. doi: 10.1186/1471-2105-14-100.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3614553/

Abstract

BACKGROUND

Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic regression (LR) and two composite models of DT-ANN and DT-LR. The collection of microarray datasets from the Gene Expression Omnibus, four breast cancer datasets were pooled for predicting five-year breast cancer relapse. After data compilation, 757 subjects, 5 clinical variables and 13,452 genetic variables were aggregated. The bootstrap method, Mann-Whitney U test and 20-fold cross-validation were performed to investigate candidate genes with 100 most-significant p-values. The predictive powers of DT, LR and ANN models were assessed using accuracy and the area under ROC curve. The associated genes were evaluated using Cox regression.

RESULTS

The DT models exhibited the lowest predictive power and the poorest extrapolation when applied to the test samples. The ANN models displayed the best predictive power and showed the best extrapolation. The 21 most-associated genes, as determined by integration of each model, were analyzed using Cox regression with a 3.53-fold (95% CI: 2.24-5.58) increased risk of breast cancer five-year recurrence.

CONCLUSIONS

The 21 selected genes can predict breast cancer recurrence. Among these genes, CCNB1, PLK1 and TOP2A are in the cell cycle G2/M DNA damage checkpoint pathway. Oncologists can offer the genetic information for patients when understanding the gene expression profiles on breast cancer recurrence.

摘要

背景

微阵列技术可以同时获取数千个基因的信息。我们分析了已发表的乳腺癌微阵列数据库，以预测五年复发，并比较了人工神经网络（ANN）、决策树（DT）和逻辑回归（LR）三种数据挖掘算法以及 DT-ANN 和 DT-LR 两种组合模型的性能。从基因表达综合数据库（Gene Expression Omnibus）中收集微阵列数据集，将四个乳腺癌数据集合并用于预测五年乳腺癌复发。在数据编制后，共汇总了 757 名患者、5 个临床变量和 13452 个遗传变量。使用 bootstrap 方法、Mann-Whitney U 检验和 20 倍交叉验证，对具有 100 个最显著 p 值的候选基因进行了研究。使用准确性和 ROC 曲线下面积评估了 DT、LR 和 ANN 模型的预测能力。使用 Cox 回归评估相关基因。

结果

当应用于测试样本时，DT 模型表现出最低的预测能力和最差的外推能力。ANN 模型显示出最佳的预测能力和最佳的外推能力。通过整合每个模型确定的 21 个最相关基因，使用 Cox 回归分析，乳腺癌五年复发的风险增加了 3.53 倍（95%CI：2.24-5.58）。

结论

这 21 个选定的基因可以预测乳腺癌的复发。在这些基因中，CCNB1、PLK1 和 TOP2A 位于细胞周期 G2/M DNA 损伤检查点途径中。肿瘤学家可以在了解乳腺癌复发的基因表达谱时为患者提供遗传信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4de5/3614553/c5ff3edc5d0b/1471-2105-14-100-1.jpg

相似文献

Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees.

BMC Bioinformatics. 2013 Mar 19;14:100. doi: 10.1186/1471-2105-14-100.

An Efficient Mixed-Model for Screening Differentially Expressed Genes of Breast Cancer Based on LR-RF.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):124-130. doi: 10.1109/TCBB.2018.2829519. Epub 2018 Apr 23.

Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms.

Comput Methods Programs Biomed. 2019 Jul;176:173-193. doi: 10.1016/j.cmpb.2019.04.008. Epub 2019 Apr 10.

Comparison of three data mining models for prediction of advanced schistosomiasis prognosis in the Hubei province.

PLoS Negl Trop Dis. 2018 Feb 15;12(2):e0006262. doi: 10.1371/journal.pntd.0006262. eCollection 2018 Feb.

Limits of predictive models using microarray data for breast cancer clinical treatment outcome.

J Natl Cancer Inst. 2005 Jun 15;97(12):927-30. doi: 10.1093/jnci/dji153.

A comparison of logistic regression analysis and an artificial neural network using the BI-RADS lexicon for ultrasonography in conjunction with introbserver variability.

J Digit Imaging. 2012 Oct;25(5):599-606. doi: 10.1007/s10278-012-9457-7.

Prediction of transition from mild cognitive impairment to Alzheimer's disease based on a logistic regression-artificial neural network-decision tree model.

Geriatr Gerontol Int. 2021 Jan;21(1):43-47. doi: 10.1111/ggi.14097. Epub 2020 Dec 1.

An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data.

BMC Med Inform Decis Mak. 2013 Nov 9;13:124. doi: 10.1186/1472-6947-13-124.

Gene expression profiling of colorectal tumors and normal mucosa by microarrays meta-analysis using prediction analysis of microarray, artificial neural network, classification, and regression trees.

Dis Markers. 2014;2014:634123. doi: 10.1155/2014/634123. Epub 2014 May 19.

Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression.

Gene. 2020 Feb 5;726:144168. doi: 10.1016/j.gene.2019.144168. Epub 2019 Nov 21.

引用本文的文献

Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer.

Sci Rep. 2023 Oct 5;13(1):16402. doi: 10.1038/s41598-023-41090-9.

MiR-520d-3p suppresses the proliferation and epithelial-mesenchymal transition of cervical cancer cells by targeting ZFP36L2.

Heliyon. 2023 Jul 28;9(8):e18789. doi: 10.1016/j.heliyon.2023.e18789. eCollection 2023 Aug.

Potential prognostic and predictive value of UBE2N, IMPDH1, DYNC1LI1 and HRASLS2 in colorectal cancer stool specimens.

Biomed Rep. 2023 Feb 7;18(3):22. doi: 10.3892/br.2023.1604. eCollection 2023 Mar.

Application of Artificial Intelligence Techniques to Predict Risk of Recurrence of Breast Cancer: A Systematic Review.

J Pers Med. 2022 Sep 13;12(9):1496. doi: 10.3390/jpm12091496.

High expression of ZFP36L2 correlates with the prognosis and immune infiltration in lower-grade glioma.

Front Genet. 2022 Jul 15;13:914219. doi: 10.3389/fgene.2022.914219. eCollection 2022.

Demystifying DPP III Catalyzed Peptide Hydrolysis-Computational Study of the Complete Catalytic Cycle of Human DPP III Catalyzed Tynorphin Hydrolysis.

Int J Mol Sci. 2022 Feb 6;23(3):1858. doi: 10.3390/ijms23031858.

Over-Expression of Centromere Protein U Participates in the Malignant Neoplastic Progression of Breast Cancer.

Front Oncol. 2021 Mar 23;11:615427. doi: 10.3389/fonc.2021.615427. eCollection 2021.

Data Mining in Healthcare: Applying Strategic Intelligence Techniques to Depict 25 Years of Research Development.

Int J Environ Res Public Health. 2021 Mar 17;18(6):3099. doi: 10.3390/ijerph18063099.

Genetic co-expression networks contribute to creating predictive model and exploring novel biomarkers for the prognosis of breast cancer.

Sci Rep. 2021 Mar 31;11(1):7268. doi: 10.1038/s41598-021-84995-z.

Silencing of ZFP36L2 increases sensitivity to temozolomide through G2/M cell cycle arrest and BAX mediated apoptosis in GBM cells.

Mol Biol Rep. 2021 Feb;48(2):1493-1503. doi: 10.1007/s11033-021-06144-z. Epub 2021 Feb 15.

本文引用的文献

Application of microarray in breast cancer: An overview.

J Pharm Bioallied Sci. 2012 Jan;4(1):21-6. doi: 10.4103/0975-7406.92726.

Weighted change-point method for detecting differential gene expression in breast cancer microarray data.

PLoS One. 2012;7(1):e29860. doi: 10.1371/journal.pone.0029860. Epub 2012 Jan 20.

Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models.

Stat Med. 2012 Jan 30;31(2):101-13. doi: 10.1002/sim.4348. Epub 2011 Dec 7.

Microarray analysis of genes associated with cell surface NIS protein levels in breast cancer.

BMC Res Notes. 2011 Oct 11;4:397. doi: 10.1186/1756-0500-4-397.

Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers.

Stat Med. 2011 Jan 15;30(1):11-21. doi: 10.1002/sim.4085. Epub 2010 Nov 5.

A novel approach for reliable microarray analysis of microdissected tumor cells from formalin-fixed and paraffin-embedded colorectal cancer resection specimens.

J Mol Med (Berl). 2009 Feb;87(2):211-24. doi: 10.1007/s00109-008-0419-y. Epub 2008 Dec 6.

Merging microarray data from separate breast cancer studies provides a robust prognostic test.

BMC Bioinformatics. 2008 Feb 27;9:125. doi: 10.1186/1471-2105-9-125.

Reproducible and reliable microarray results through quality control: good laboratory proficiency and appropriate data analysis practices are essential.

Curr Opin Biotechnol. 2008 Feb;19(1):10-8. doi: 10.1016/j.copbio.2007.11.003. Epub 2007 Dec 26.

Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.

Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. doi: 10.1002/sim.2929.

Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series.

Clin Cancer Res. 2007 Jun 1;13(11):3207-14. doi: 10.1158/1078-0432.CCR-06-2765.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

采用逻辑回归、人工神经网络和决策树的 pooled cDNA 微阵列分析进行乳腺癌生存能力的基因表达谱分析。

Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees.

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献