使用惩罚回归方法对三个组学数据进行整合分析：在膀胱癌中的应用

Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.

作者信息

Pineda Silvia, Real Francisco X, Kogevinas Manolis, Carrato Alfredo, Chanock Stephen J, Malats Núria, Van Steen Kristel

机构信息

Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.

Systems and Modeling Unit-BIO3, Montefiore Institute, Liège, Belgium.

出版信息

PLoS Genet. 2015 Dec 8;11(12):e1005689. doi: 10.1371/journal.pgen.1005689. eCollection 2015 Dec.

DOI:10.1371/journal.pgen.1005689

PMID:26646822

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4672920/

Abstract

Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.

摘要

组学数据整合对于研究复杂疾病所涉及的基因组机制正变得必不可少。在整合过程中，出现了许多挑战，如数据异质性、与参数数量相比个体数量较少、多重共线性以及由于结果的复杂性和对生物过程缺乏了解而导致的结果解释和验证。为了克服其中一些问题，正在开发创新的统计方法。在这项工作中，我们提出了一种基于排列的方法，通过MaxT算法同时评估显著性并进行多重检验校正。在探索膀胱肿瘤样本中测量的常见基因变异、DNA甲基化和基因表达之间的关系时，将其与惩罚回归方法（LASSO和ENET）一起应用。总体分析流程包括三个步骤：（1）在基因上下游1Mb窗口内的每个基因探针中选择单核苷酸多态性（SNPs）/甲基化位点（CpGs）；（2）应用LASSO和ENET在三个多变量模型（SNP、CPG和全局模型，后者整合了SNPs和CPGs）中评估每个表达探针与所选SNPs/CpGs之间的关联；（3）使用基于排列的MaxT方法评估每个模型的显著性。我们鉴定出48个基因，其表达水平与SNPs和CpGs均显著相关。重要的是，其中36个（75%）在独立数据集（TCGA）中得到了重复，并且通过模拟研究检查了所提出方法的性能。我们通过基于富集分析的生物学解释进一步支持了我们的结果。我们提出方法允许减少计算时间，并且在分析几种类型的组学数据时灵活且易于实施。我们的结果强调了通过应用适当的统计策略整合组学数据以发现疾病状况中复杂遗传机制新见解的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30b3/4672920/774e9e9524dc/pgen.1005689.g001.jpg

相似文献

Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.

PLoS Genet. 2015 Dec 8;11(12):e1005689. doi: 10.1371/journal.pgen.1005689. eCollection 2015 Dec.

Integrative eQTL analysis of tumor and host omics data in individuals with bladder cancer.

Genet Epidemiol. 2017 Sep;41(6):567-573. doi: 10.1002/gepi.22053. Epub 2017 Jun 23.

Multi-omics analysis identifies CpGs near G6PC2 mediating the effects of genetic variants on fasting glucose.

Diabetologia. 2021 Jul;64(7):1613-1625. doi: 10.1007/s00125-021-05449-9. Epub 2021 Apr 12.

Framework for the Integration of Genomics, Epigenomics and Transcriptomics in Complex Diseases.

Hum Hered. 2015;79(3-4):124-36. doi: 10.1159/000381184. Epub 2015 Jul 28.

MOSES: a methylation-based gene association approach for unveiling environmentally regulated genes linked to a trait or disease.

Clin Epigenetics. 2024 Nov 18;16(1):161. doi: 10.1186/s13148-024-01776-x.

A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification.

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz045.

BioVLAB-mCpG-SNP-EXPRESS: A system for multi-level and multi-perspective analysis and exploration of DNA methylation, sequence variation (SNPs), and gene expression from multi-omics data.

Methods. 2016 Dec 1;111:64-71. doi: 10.1016/j.ymeth.2016.07.019. Epub 2016 Jul 28.

Quantitative genome-wide methylation analysis of high-grade non-muscle invasive bladder cancer.

Epigenetics. 2016 Mar 3;11(3):237-46. doi: 10.1080/15592294.2016.1154246. Epub 2016 Mar 1.

Enabling personal genomics with an explicit test of epistasis.

Pac Symp Biocomput. 2010:327-36. doi: 10.1142/9789814295291_0035.

Common genetic variants in the PSCA gene influence gene expression and bladder cancer risk.

Proc Natl Acad Sci U S A. 2012 Mar 27;109(13):4974-9. doi: 10.1073/pnas.1202189109. Epub 2012 Mar 13.

引用本文的文献

Predicting Depression, Anxiety, and Their Comorbidity among Patients with Breast Cancer in China Using Machine Learning: A Multisite Cross-Sectional Study.

Depress Anxiety. 2024 Jun 21;2024:3923160. doi: 10.1155/2024/3923160. eCollection 2024.

Methods for multi-omic data integration in cancer research.

Front Genet. 2024 Sep 19;15:1425456. doi: 10.3389/fgene.2024.1425456. eCollection 2024.

Transforming Clinical Research: The Power of High-Throughput Omics Integration.

Proteomes. 2024 Sep 6;12(3):25. doi: 10.3390/proteomes12030025.

A multiomic ferroptosis-associated prognostic signature incorporating epigenetic and transcriptional biomarkers for hepatocellular carcinoma.

Transl Cancer Res. 2022 Jul;11(7):1889-1897. doi: 10.21037/tcr-21-2882.

Recent Progress and Future Direction for the Application of Multiomics Data in Clinical Liver Transplantation.

J Clin Transl Hepatol. 2022 Apr 28;10(2):363-373. doi: 10.14218/JCTH.2021.00219. Epub 2022 Jan 4.

TiMEG: an integrative statistical method for partially missing multi-omics data.

Sci Rep. 2021 Dec 15;11(1):24077. doi: 10.1038/s41598-021-03034-z.

Exploring the Key Genes and Identification of Potential Diagnosis Biomarkers in Alzheimer's Disease Using Bioinformatics Analysis.

Front Aging Neurosci. 2021 Jun 14;13:602781. doi: 10.3389/fnagi.2021.602781. eCollection 2021.

Multi-omics approaches identify and as candidate autophagic regulators and druggable targets in invasive breast carcinoma.

Acta Pharm Sin B. 2021 May;11(5):1227-1245. doi: 10.1016/j.apsb.2020.12.013. Epub 2020 Dec 19.

Immunological Hallmarks for Clinical Response to BCG in Bladder Cancer.

Front Immunol. 2021 Jan 29;11:615091. doi: 10.3389/fimmu.2020.615091. eCollection 2020.

IMIX: a multivariate mixture model approach to association analysis through multi-omics data integration.

Bioinformatics. 2021 Apr 1;36(22-23):5439-5447. doi: 10.1093/bioinformatics/btaa1001.

本文引用的文献

Framework for the Integration of Genomics, Epigenomics and Transcriptomics in Complex Diseases.

Hum Hered. 2015;79(3-4):124-36. doi: 10.1159/000381184. Epub 2015 Jul 28.

The UBC-40 Urothelial Bladder Cancer cell line index: a genomic resource for functional studies.

BMC Genomics. 2015 May 22;16(1):403. doi: 10.1186/s12864-015-1450-3.

Integrative analysis of haplotype-resolved epigenomes across human tissues.

Nature. 2015 Feb 19;518(7539):350-354. doi: 10.1038/nature14217.

Methods of integrating data to uncover genotype-phenotype interactions.

Nat Rev Genet. 2015 Feb;16(2):85-97. doi: 10.1038/nrg3868. Epub 2015 Jan 13.

Molecular biology of bladder cancer: new insights into pathogenesis and clinical diversity.

Nat Rev Cancer. 2015 Jan;15(1):25-41. doi: 10.1038/nrc3817.

Cis and trans effects of human genomic variants on gene expression.

PLoS Genet. 2014 Jul 10;10(7):e1004461. doi: 10.1371/journal.pgen.1004461. eCollection 2014 Jul.

Intrinsic basal and luminal subtypes of muscle-invasive bladder cancer.

Nat Rev Urol. 2014 Jul;11(7):400-10. doi: 10.1038/nrurol.2014.129. Epub 2014 Jun 24.

TCGA-assembler: open-source software for retrieving and processing TCGA data.

Nat Methods. 2014 Jun;11(6):599-600. doi: 10.1038/nmeth.2956.

Genetic variation in the TP53 pathway and bladder cancer risk. a comprehensive analysis.

PLoS One. 2014 May 12;9(5):e89952. doi: 10.1371/journal.pone.0089952. eCollection 2014.

Principles and methods of integrative genomic analyses in cancer.

Nat Rev Cancer. 2014 May;14(5):299-313. doi: 10.1038/nrc3721.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用惩罚回归方法对三个组学数据进行整合分析：在膀胱癌中的应用

Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献