基于偏最小二乘回归的基因表达数据多类预测：在乳腺癌内在分类中的应用

Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy.

作者信息

Huang Chi-Cheng, Tu Shih-Hsin, Huang Ching-Shui, Lien Heng-Hui, Lai Liang-Chuan, Chuang Eric Y

机构信息

Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan ; Cathay General Hospital SiJhih, New Taipei, Taiwan ; School of Medicine, Fu-Jen Catholic University, New Taipei, Taiwan ; School of Medicine, Taipei Medical University, Taipei, Taiwan.

School of Medicine, Taipei Medical University, Taipei, Taiwan ; Department of Surgery, Cathay General Hospital, Taipei, Taiwan.

出版信息

Biomed Res Int. 2013;2013:248648. doi: 10.1155/2013/248648. Epub 2013 Dec 30.

DOI:10.1155/2013/248648

PMID:24490149

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3893734/

Abstract

Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n = 535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.

摘要

多类预测仍然是诸如微阵列基因表达谱等高通量数据分析的一个障碍。尽管机器学习和生物信息学最近取得了进展，但大多数分类工具仅限于二元响应的应用。我们的目的是将偏最小二乘（PLS）回归应用于乳腺癌内在分类，其中确定了五种不同的分子亚型。PAM50特征基因在PLS分析中用作预测变量，潜在基因成分得分在每种分子亚型的二元逻辑回归中使用。用于PAM50开发的139个典型阵列用作训练数据集，三项来自汉族的独立微阵列研究用于独立验证（n = 535）。在训练样本中，基于PAM50质心的单样本预测（SSP）与PLS回归之间的一致性非常好（加权Kappa：0.988），但在独立样本中大幅下降，这可能归因于PLS回归有更多未分类的样本。如果去除这些未分类的样本，PAM50 SSP与PLS回归之间的一致性会极大提高（加权Kappa：0.829，而分析未分类样本时为0.541）。我们的研究确定了PLS回归在多类预测中的可行性，并且在乳腺癌分子亚型中观察到了不同的临床表现和预后差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43d3/3893734/786c94be658b/BMRI2013-248648.001.jpg

相似文献

Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy.

Biomed Res Int. 2013;2013:248648. doi: 10.1155/2013/248648. Epub 2013 Dec 30.

Prediction consistency and clinical presentations of breast cancer molecular subtypes for Han Chinese population.

J Transl Med. 2012 Sep 19;10 Suppl 1(Suppl 1):S10. doi: 10.1186/1479-5876-10-S1-S10.

Molecular subtyping of breast cancer intrinsic taxonomy with oligonucleotide microarray and NanoString nCounter.

Biosci Rep. 2021 Aug 27;41(8). doi: 10.1042/BSR20211428.

The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set.

PLoS One. 2015 Jul 1;10(7):e0129711. doi: 10.1371/journal.pone.0129711. eCollection 2015.

PAM50 breast cancer subtyping by RT-qPCR and concordance with standard clinical molecular markers.

BMC Med Genomics. 2012 Oct 4;5:44. doi: 10.1186/1755-8794-5-44.

Research-based PAM50 signature and long-term breast cancer survival.

Breast Cancer Res Treat. 2020 Jan;179(1):197-206. doi: 10.1007/s10549-019-05446-y. Epub 2019 Sep 21.

Development and verification of the PAM50-based Prosigna breast cancer gene signature assay.

BMC Med Genomics. 2015 Aug 22;8:54. doi: 10.1186/s12920-015-0129-6.

Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer.

Sci Rep. 2020 Aug 21;10(1):14071. doi: 10.1038/s41598-020-70832-2.

Expression and methylation patterns partition luminal-A breast tumors into distinct prognostic subgroups.

Breast Cancer Res. 2016 Jul 7;18(1):74. doi: 10.1186/s13058-016-0724-2.

Quantification of intrinsic subtype ambiguity in Luminal A breast cancer and its relationship to clinical outcomes.

BMC Cancer. 2019 Mar 8;19(1):215. doi: 10.1186/s12885-019-5392-z.

引用本文的文献

Value of genomics- and radiomics-based machine learning models in the identification of breast cancer molecular subtypes: a systematic review and meta-analysis.

Ann Transl Med. 2022 Dec;10(24):1394. doi: 10.21037/atm-22-5986.

Targeted Sequencing of Taiwanese Breast Cancer with Risk Stratification by the Concurrent Genes Signature: A Feasibility Study.

J Pers Med. 2021 Jun 28;11(7):613. doi: 10.3390/jpm11070613.

Residual risk stratification of Taiwanese breast cancers following curative therapies with the extended concurrent genes signature.

Breast Cancer Res Treat. 2021 Apr;186(2):475-485. doi: 10.1007/s10549-020-06058-7. Epub 2021 Jan 3.

Characterization of the Fundulus heteroclitus embryo transcriptional response and development of a gene expression-based fingerprint of exposure for the alternative flame retardant, TBPH (bis (2-ethylhexyl)-tetrabromophthalate).

Environ Pollut. 2019 Apr;247:696-705. doi: 10.1016/j.envpol.2019.01.010. Epub 2019 Jan 10.

Integrative radiomics expression predicts molecular subtypes of primary clear cell renal cell carcinoma.

Clin Radiol. 2018 Sep;73(9):782-791. doi: 10.1016/j.crad.2018.04.009. Epub 2018 May 23.

本文引用的文献

Estrogen receptor status prediction by gene component regression: a comparative study.

Int J Data Min Bioinform. 2014;9(2):149-71. doi: 10.1504/ijdmb.2014.059065.

Prediction consistency and clinical presentations of breast cancer molecular subtypes for Han Chinese population.

J Transl Med. 2012 Sep 19;10 Suppl 1(Suppl 1):S10. doi: 10.1186/1479-5876-10-S1-S10.

Correlation of microarray-based breast cancer molecular subtypes and clinical outcomes: implications for treatment optimization.

BMC Cancer. 2011 Apr 18;11:143. doi: 10.1186/1471-2407-11-143.

The importance of gene-centring microarray data.

Lancet Oncol. 2010 Aug;11(8):719-20; author reply 720-1. doi: 10.1016/S1470-2045(10)70174-1.

Evaluating microarray-based classifiers: an overview.

Cancer Inform. 2008;6:77-97. doi: 10.4137/cin.s408. Epub 2008 Feb 29.

Supervised risk predictor of breast cancer based on intrinsic subtypes.

J Clin Oncol. 2009 Mar 10;27(8):1160-7. doi: 10.1200/JCO.2008.18.1370. Epub 2009 Feb 9.

Predicting features of breast cancer with gene expression patterns.

Breast Cancer Res Treat. 2008 Mar;108(2):191-201. doi: 10.1007/s10549-007-9596-6. Epub 2007 May 22.

Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting.

J Natl Cancer Inst. 2007 Jan 17;99(2):147-57. doi: 10.1093/jnci/djk018.

Dimension reduction for classification with gene expression microarray data.

Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.

The molecular portraits of breast tumors are conserved across microarray platforms.

BMC Genomics. 2006 Apr 27;7:96. doi: 10.1186/1471-2164-7-96.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于偏最小二乘回归的基因表达数据多类预测：在乳腺癌内在分类中的应用

Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy.

作者信息

Huang Chi-Cheng, Tu Shih-Hsin, Huang Ching-Shui, Lien Heng-Hui, Lai Liang-Chuan, Chuang Eric Y

机构信息

School of Medicine, Taipei Medical University, Taipei, Taiwan ; Department of Surgery, Cathay General Hospital, Taipei, Taiwan.

出版信息

Biomed Res Int. 2013;2013:248648. doi: 10.1155/2013/248648. Epub 2013 Dec 30.

DOI:10.1155/2013/248648

PMID:24490149

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3893734/

Abstract

摘要

基于偏最小二乘回归的基因表达数据多类预测：在乳腺癌内在分类中的应用

Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于偏最小二乘回归的基因表达数据多类预测：在乳腺癌内在分类中的应用

Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献