用于精准肿瘤学中高维组学数据分析的知识引导统计学习方法

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.

作者信息

Zhao Yize, Chang Changgee, Long Qi

机构信息

Weill Cornell Medicine, New York, NY.

University of Pennsylvania Perelman School of Medicine, Philadelphia, PA.

出版信息

JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.

DOI:10.1200/PO.19.00018

PMID:35100722

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9797232/

Abstract

High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.

摘要

高维组学数据，如基因组学、转录组学和代谢组学数据，在推进精准医学方面具有巨大潜力。特别是，此类数据使得对癌症等复杂疾病的研究能够以前所未有的规模和多维度进行。然而，一些分析挑战使高维组学数据的分析变得复杂。其中之一是人们越来越认识到，癌症等复杂疾病是多因素的，可能归因于多个组学层面和通路层面的有害变化。当重要通路中的单个基因信号相对较弱时，单独检测它们可能具有挑战性，但通路中的聚合信号可能会强得多，因此在相同样本量下更容易检测到。为应对这些挑战，关于用于分析高维组学数据的知识引导统计学习方法的文献越来越多，这些方法可以纳入功能基因组学和功能蛋白质组学等生物学知识。与不使用生物学知识的统计学习方法相比，这些方法已被证明可以提高预测和分类准确性，并产生生物学上更具可解释性的结果。在本综述中，我们调查了当前的知识引导统计学习方法，包括监督学习和无监督学习，以及它们在精准肿瘤学中的应用，并讨论了未来的研究方向。

相似文献

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.用于精准肿瘤学中高维组学数据分析的知识引导统计学习方法

JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.

Knowledge-guided learning methods for integrative analysis of multi-omics data.用于多组学数据综合分析的知识引导学习方法。

Comput Struct Biotechnol J. 2024 Apr 30;23:1945-1950. doi: 10.1016/j.csbj.2024.04.053. eCollection 2024 Dec.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Data-Driven Methods for Advancing Precision Oncology.推进精准肿瘤学的数据驱动方法。

Curr Pharmacol Rep. 2018 Apr;4(2):145-156. doi: 10.1007/s40495-018-0127-4. Epub 2018 Mar 6.

Precision omics data integration and analysis with interoperable ontologies and their application for COVID-19 research.精准组学数据的整合与分析，采用可互操作的本体论及其在 COVID-19 研究中的应用。

Brief Funct Genomics. 2021 Jul 17;20(4):235-248. doi: 10.1093/bfgp/elab029.

Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integration in Precision Medicine.深度学习在精准医学中基因组、蛋白质组和代谢组数据集成中的应用。

OMICS. 2018 Oct;22(10):630-636. doi: 10.1089/omi.2018.0097. Epub 2018 Aug 20.

XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data.XOmiVAE：一种使用高维组学数据进行癌症分类的可解释深度学习模型。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab315.

Enter the Matrix: Factorization Uncovers Knowledge from Omics.《进入矩阵：从组学中发现知识的因子分解》

Trends Genet. 2018 Oct;34(10):790-805. doi: 10.1016/j.tig.2018.07.003. Epub 2018 Aug 22.

Multiset sparse partial least squares path modeling for high dimensional omics data analysis.多集稀疏偏最小二乘路径建模在高维组学数据分析中的应用。

BMC Bioinformatics. 2020 Jan 9;21(1):9. doi: 10.1186/s12859-019-3286-3.

Multi-omics data integration approaches for precision oncology.多组学数据整合方法在精准肿瘤学中的应用。

Mol Omics. 2022 Jul 11;18(6):469-479. doi: 10.1039/d1mo00411e.

引用本文的文献

Graph-guided Bayesian Factor Model for Integrative Analysis of Multi-modal Data with Noisy Network Information.基于图引导的贝叶斯因子模型用于含噪声网络信息的多模态数据综合分析

Stat Biosci. 2024 Aug 11. doi: 10.1007/s12561-024-09452-7.

Knowledge-guided learning methods for integrative analysis of multi-omics data.用于多组学数据综合分析的知识引导学习方法。

Comput Struct Biotechnol J. 2024 Apr 30;23:1945-1950. doi: 10.1016/j.csbj.2024.04.053. eCollection 2024 Dec.

Accounting for network noise in graph-guided Bayesian modeling of structured high-dimensional data.在基于图引导的贝叶斯建模对结构化高维数据进行建模时，考虑网络噪声的影响。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae012.

Incorporating graph information in Bayesian factor analysis with robust and adaptive shrinkage priors.在具有稳健和自适应收缩先验的贝叶斯因子分析中纳入图信息。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad014.

Integrative Learning of Structured High-Dimensional Data from Multiple Datasets.从多个数据集对结构化高维数据进行整合学习。

Stat Anal Data Min. 2023 Apr;16(2):120-134. doi: 10.1002/sam.11601. Epub 2022 Nov 8.

Graph-guided Bayesian SVM with Adaptive Structured Shrinkage Prior for High-dimensional Data.用于高维数据的具有自适应结构收缩先验的图引导贝叶斯支持向量机

Proc IEEE Int Conf Big Data. 2021 Dec;2021:4472-4479. doi: 10.1109/bigdata52589.2021.9671712.

本文引用的文献

Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data.用于整合聚类的广义贝叶斯因子分析及其在多组学数据中的应用

Proc Int Conf Data Sci Adv Anal. 2018 Oct;2018:109-119. doi: 10.1109/DSAA.2018.00021. Epub 2019 Feb 4.

Knowledge-Guided Bayesian Support Vector Machine for High-Dimensional Data with Application to Analysis of Genomics Data.用于高维数据的知识引导贝叶斯支持向量机及其在基因组数据分析中的应用

Proc IEEE Int Conf Big Data. 2018 Dec;2018:1484-1493. doi: 10.1109/BigData.2018.8622484. Epub 2019 Jan 24.

Bayesian generalized biclustering analysis via adaptive structured shrinkage.基于自适应结构收缩的贝叶斯广义双聚类分析。

Biostatistics. 2020 Jul 1;21(3):610-624. doi: 10.1093/biostatistics/kxy081.

Penalized co-inertia analysis with applications to -omics data.带惩罚的共惯性分析及其在组学数据中的应用。

Bioinformatics. 2019 Mar 15;35(6):1018-1025. doi: 10.1093/bioinformatics/bty726.

Bayesian variable selection with graphical structure learning: Applications in integrative genomics.贝叶斯变量选择与图形结构学习：在整合基因组学中的应用。

PLoS One. 2018 Jul 30;13(7):e0195070. doi: 10.1371/journal.pone.0195070. eCollection 2018.

Scalable Bayesian variable selection for structured high-dimensional data.用于结构化高维数据的可扩展贝叶斯变量选择

Biometrics. 2018 Dec;74(4):1372-1382. doi: 10.1111/biom.12882. Epub 2018 May 8.

A NOVEL AND EFFICIENT ALGORITHM FOR DE NOVO DISCOVERY OF MUTATED DRIVER PATHWAYS IN CANCER.一种用于癌症中从头发现突变驱动通路的新颖且高效的算法。

Ann Appl Stat. 2017 Sep;11(3):1481-1512. doi: 10.1214/17-AOAS1042. Epub 2017 Oct 5.

Incorporating biological information in sparse principal component analysis with application to genomic data.将生物信息纳入稀疏主成分分析并应用于基因组数据。

BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.通过结合生物信息的稀疏典型相关分析对转录组学和代谢组学数据进行综合分析。

Biometrics. 2018 Mar;74(1):300-312. doi: 10.1111/biom.12715. Epub 2017 May 8.

Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence.结合已知和新生物信息的分层特征选择：识别与前列腺癌复发相关的基因组特征。

J Am Stat Assoc. 2016;111(516):1427-1439. doi: 10.1080/01621459.2016.1164051. Epub 2017 Jan 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验