基于非负矩阵分解的癌症分类和通路发现。

Cancer classification and pathway discovery using non-negative matrix factorization.

机构信息

Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA.

Committee on Developmental Biology and Regenerative Medicine, The University of Chicago, Chicago, IL, USA.

出版信息

J Biomed Inform. 2019 Aug;96:103247. doi: 10.1016/j.jbi.2019.103247. Epub 2019 Jul 2.

DOI:10.1016/j.jbi.2019.103247

PMID:31271844

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6697569/

Abstract

OBJECTIVES

Extracting genetic information from a full range of sequencing data is important for understanding disease. We propose a novel method to effectively explore the landscape of genetic mutations and aggregate them to predict cancer type.

DESIGN

We applied non-smooth non-negative matrix factorization (nsNMF) and support vector machine (SVM) to utilize the full range of sequencing data, aiming to better aggregate genetic mutations and improve their power to predict disease type. More specifically, we introduce a novel classifier to distinguish cancer types using somatic mutations obtained from whole-exome sequencing data. Mutations were identified from multiple cancers and scored using SIFT, PP2, and CADD, and collapsed at the individual gene level. nsNMF was then applied to reduce dimensionality and obtain coefficient and basis matrices. A feature matrix was derived from the obtained matrices to train a classifier for cancer type classification with the SVM model.

RESULTS

We have demonstrated that the classifier was able to distinguish four cancer types with reasonable accuracy. In five-fold cross-validations using mutation counts as features, the average prediction accuracy was 80% (SEM = 0.1%), significantly outperforming baselines and outperforming models using mutation scores as features.

CONCLUSION

Using the factor matrices derived from the nsNMF, we identified multiple genes and pathways that are significantly associated with each cancer type. This study presents a generic and complete pipeline to study the associations between somatic mutations and cancers. The proposed method can be adapted to other studies for disease status classification and pathway discovery.

摘要

目的

从全范围测序数据中提取遗传信息对于理解疾病非常重要。我们提出了一种新的方法，可以有效地探索基因突变的全貌，并将其聚合起来预测癌症类型。

设计

我们应用非光滑非负矩阵分解（nsNMF）和支持向量机（SVM）来利用全范围测序数据，旨在更好地聚合基因突变并提高其预测疾病类型的能力。更具体地说，我们引入了一种新的分类器，使用从全外显子组测序数据中获得的体细胞突变来区分癌症类型。突变是从多种癌症中识别出来的，并使用 SIFT、PP2 和 CADD 进行评分，并在个体基因水平上进行合并。然后应用 nsNMF 来降低维度，并获得系数和基础矩阵。从获得的矩阵中得到特征矩阵，并用 SVM 模型训练用于癌症类型分类的分类器。

结果

我们已经证明，该分类器能够以合理的准确度区分四种癌症类型。在使用突变计数作为特征的五重交叉验证中，平均预测准确率为 80%（SEM=0.1%），明显优于基线和使用突变评分作为特征的模型。

结论

使用 nsNMF 导出的因子矩阵，我们确定了多个与每种癌症类型显著相关的基因和途径。这项研究提出了一种通用且完整的研究体细胞突变与癌症之间关联的管道。所提出的方法可以适应其他疾病状态分类和途径发现的研究。

相似文献

Cancer classification and pathway discovery using non-negative matrix factorization.

J Biomed Inform. 2019 Aug;96:103247. doi: 10.1016/j.jbi.2019.103247. Epub 2019 Jul 2.

Identification of the functional alteration signatures across different cancer types with support vector machine and feature analysis.

Biochim Biophys Acta Mol Basis Dis. 2018 Jun;1864(6 Pt B):2218-2227. doi: 10.1016/j.bbadis.2017.12.026. Epub 2017 Dec 19.

Deep learning for cancer type classification and driver gene identification.

BMC Bioinformatics. 2021 Oct 25;22(Suppl 4):491. doi: 10.1186/s12859-021-04400-4.

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.

Genome Inform. 2008;21:200-11.

Classification of breast cancer patients using somatic mutation profiles and machine learning approaches.

BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):62. doi: 10.1186/s12918-016-0306-z.

Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification.

Genomics Proteomics Bioinformatics. 2017 Dec;15(6):389-395. doi: 10.1016/j.gpb.2017.08.002. Epub 2017 Dec 12.

Tumor classification based on non-negative matrix factorization using gene expression data.

IEEE Trans Nanobioscience. 2011 Jun;10(2):86-93. doi: 10.1109/TNB.2011.2144998. Epub 2011 Jul 7.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

Distinct transcriptional programs stratify ovarian cancer cell lines into the five major histological subtypes.

Genome Med. 2021 Sep 1;13(1):140. doi: 10.1186/s13073-021-00952-5.

Support vector machine classifier for prediction of the metastasis of colorectal cancer.

Int J Mol Med. 2018 Mar;41(3):1419-1426. doi: 10.3892/ijmm.2018.3359. Epub 2018 Jan 2.

引用本文的文献

Integration of Multi-Omics Data for the Classification of Glioma Types and Identification of Novel Biomarkers.

Bioinform Biol Insights. 2024 May 27;18:11779322241249563. doi: 10.1177/11779322241249563. eCollection 2024.

An machine learning model to predict quality of life subtypes of disabled stroke survivors.

Ann Clin Transl Neurol. 2024 Feb;11(2):404-413. doi: 10.1002/acn3.51960. Epub 2023 Dec 7.

Mass Spectrometry-Based Proteogenomics: New Therapeutic Opportunities for Precision Medicine.

Annu Rev Pharmacol Toxicol. 2024 Jan 23;64:455-479. doi: 10.1146/annurev-pharmtox-022723-113921. Epub 2023 Sep 22.

Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis.

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):850-866. doi: 10.1016/j.gpb.2022.11.003. Epub 2022 Dec 1.

Multi-omics assessment of dilated cardiomyopathy using non-negative matrix factorization.

PLoS One. 2022 Aug 18;17(8):e0272093. doi: 10.1371/journal.pone.0272093. eCollection 2022.

Application of non-negative matrix factorization in oncology: one approach for establishing precision medicine.

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac246.

AX-Unet: A Deep Learning Framework for Image Segmentation to Assist Pancreatic Tumor Diagnosis.

Front Oncol. 2022 Jun 2;12:894970. doi: 10.3389/fonc.2022.894970. eCollection 2022.

Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide.

Genes (Basel). 2022 Apr 7;13(4):648. doi: 10.3390/genes13040648.

Radiomics of Contrast-Enhanced Computed Tomography: A Potential Biomarker for Pretreatment Prediction of the Response to Calmette-Guerin Immunotherapy in Non-Muscle-Invasive Bladder Cancer.

Front Cell Dev Biol. 2022 Feb 25;10:814388. doi: 10.3389/fcell.2022.814388. eCollection 2022.

Comprehensive Genomic and Epigenomic Analyses on Transcriptomic Regulation in Stomach Adenocarcinoma.

Front Genet. 2022 Feb 11;12:778095. doi: 10.3389/fgene.2021.778095. eCollection 2021.

本文引用的文献

Integrating hypertension phenotype and genotype with hybrid non-negative matrix factorization.

Bioinformatics. 2019 Aug 15;35(16):2885. doi: 10.1093/bioinformatics/btz049.

New approach for understanding genome variations in KEGG.

Nucleic Acids Res. 2019 Jan 8;47(D1):D590-D595. doi: 10.1093/nar/gky962.

Predicting cancer type from tumour DNA signatures.

Genome Med. 2017 Nov 28;9(1):104. doi: 10.1186/s13073-017-0493-2.

The Reactome Pathway Knowledgebase.

Nucleic Acids Res. 2018 Jan 4;46(D1):D649-D655. doi: 10.1093/nar/gkx1132.

Tensor Factorization for Precision Medicine in Heart Failure with Preserved Ejection Fraction.

J Cardiovasc Transl Res. 2017 Jun;10(3):305-312. doi: 10.1007/s12265-016-9727-8. Epub 2017 Jan 23.

Integrative analysis of mutational and transcriptional profiles reveals driver mutations of metastatic breast cancers.

Cell Discov. 2016 Aug 30;2:16025. doi: 10.1038/celldisc.2016.25. eCollection 2016.

Classification of breast cancer patients using somatic mutation profiles and machine learning approaches.

BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):62. doi: 10.1186/s12918-016-0306-z.

HEALTH CARE POLICY. Ten things we have to do to achieve precision medicine.

Science. 2015 Jul 3;349(6243):37-8. doi: 10.1126/science.aab1328. Epub 2015 Jul 2.

An information theoretic method to identify combinations of genomic alterations that promote glioblastoma.

J Mol Cell Biol. 2015 Jun;7(3):203-13. doi: 10.1093/jmcb/mjv026. Epub 2015 May 4.

Identifying cancer-related microRNAs based on gene expression data.

Bioinformatics. 2015 Apr 15;31(8):1226-34. doi: 10.1093/bioinformatics/btu811. Epub 2014 Dec 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于非负矩阵分解的癌症分类和通路发现。

Cancer classification and pathway discovery using non-negative matrix factorization.

机构信息

出版信息

OBJECTIVES

DESIGN

RESULTS

CONCLUSION

目的

设计

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献