基于网络的对（表观）遗传测量之间关联的稳健分析。

Robust network-based analysis of the associations between (epi)genetic measurements.

作者信息

Wu Cen, Zhang Qingzhao, Jiang Yu, Ma Shuangge

机构信息

Department of Statistics, Kansas State University, Manhattan, KS, 66506, USA.

School of Economics and the Wang Yanan Institute for Studies in Economics, Xiamen University.

出版信息

J Multivar Anal. 2018 Nov;168:119-130. doi: 10.1016/j.jmva.2018.06.009. Epub 2018 Jul 10.

DOI:10.1016/j.jmva.2018.06.009

PMID:30983643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6456078/

Abstract

With its important biological implications, modeling the associations of gene expression (GE) and copy number variation (CNV) has been extensively conducted. Such analysis is challenging because of the high data dimensionality, lack of knowledge regulating CNVs for a specific GE, different behaviors of the -acting and -acting CNVs, possible long-tailed distributions and contamination of GE measurements, and correlations between CNVs. The existing methods fail to address one or more of these challenges. In this study, a new method is developed to model more effectively the GE-CNV associations. Specifically, for each GE, a partially linear model, with a nonlinear -acting CNV effect, is assumed. A robust loss function is adopted to accommodate long-tailed distributions and data contamination. We adopt penalization to accommodate the high dimensionality and identify relevant CNVs. A network structure is introduced to accommodate the correlations among CNVs. The proposed method comprehensively accommodates multiple challenging characteristics of GE-CNV modeling and effectively overcomes the limitations of existing methods. We develop an effective computational algorithm and rigorously establish the consistency properties. Simulation shows the superiority of the proposed method over alternatives. The TCGA (The Cancer Genome Atlas) data on the PCD (programmed cell death) pathway are analyzed, and the proposed method has improved prediction and stability and biologically plausible findings.

摘要

鉴于其重要的生物学意义，对基因表达（GE）与拷贝数变异（CNV）之间的关联进行建模已得到广泛开展。由于数据维度高、缺乏针对特定GE调控CNV的知识、顺式作用和反式作用CNV的不同行为、GE测量可能存在的长尾分布和数据污染以及CNV之间的相关性，此类分析具有挑战性。现有方法无法解决这些挑战中的一个或多个。在本研究中，开发了一种新方法以更有效地对GE-CNV关联进行建模。具体而言，对于每个GE，假定一个具有非线性反式作用CNV效应的部分线性模型。采用稳健损失函数以适应长尾分布和数据污染。我们采用惩罚来适应高维度并识别相关的CNV。引入网络结构以适应CNV之间的相关性。所提出的方法全面考虑了GE-CNV建模的多个具有挑战性的特征，并有效克服了现有方法的局限性。我们开发了一种有效的计算算法并严格确立了一致性属性。模拟结果表明所提出的方法优于其他方法。对癌症基因组图谱（TCGA）中关于程序性细胞死亡（PCD）途径的数据进行了分析，所提出的方法具有更好的预测性和稳定性以及生物学上合理的发现。

相似文献

Robust network-based analysis of the associations between (epi)genetic measurements.

J Multivar Anal. 2018 Nov;168:119-130. doi: 10.1016/j.jmva.2018.06.009. Epub 2018 Jul 10.

Inferring gene regulatory relationships with a high-dimensional robust approach.

Genet Epidemiol. 2017 Jul;41(5):437-454. doi: 10.1002/gepi.22047. Epub 2017 May 2.

Analysis of cancer gene expression data with an assisted robust marker identification approach.

Genet Epidemiol. 2017 Dec;41(8):779-789. doi: 10.1002/gepi.22066. Epub 2017 Sep 14.

Tissue-Specific eQTL in Zebrafish.

Methods Mol Biol. 2020;2082:239-249. doi: 10.1007/978-1-0716-0026-9_17.

Noise cancellation using total variation for copy number variation detection.

BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.

MCKAT: a multi-dimensional copy number variant kernel association test.

BMC Bioinformatics. 2021 Dec 11;22(1):588. doi: 10.1186/s12859-021-04494-w.

Genome-wide algorithm for detecting CNV associations with diseases.

BMC Bioinformatics. 2011 Aug 9;12:331. doi: 10.1186/1471-2105-12-331.

Robust semiparametric gene-environment interaction analysis using sparse boosting.

Stat Med. 2019 Oct 15;38(23):4625-4641. doi: 10.1002/sim.8322. Epub 2019 Jul 29.

Identifying gene-environment interactions for prognosis using a robust approach.

Econom Stat. 2017 Oct;4:105-120. doi: 10.1016/j.ecosta.2016.10.004. Epub 2016 Nov 16.

CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data.

Front Genet. 2021 Aug 16;12:700874. doi: 10.3389/fgene.2021.700874. eCollection 2021.

引用本文的文献

JSNMFuP: a unsupervised method for the integrative analysis of single-cell multi-omics data based on non-negative matrix factorization.

BMC Genomics. 2025 Mar 20;26(1):274. doi: 10.1186/s12864-025-11462-8.

Methods for multi-omic data integration in cancer research.

Front Genet. 2024 Sep 19;15:1425456. doi: 10.3389/fgene.2024.1425456. eCollection 2024.

Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies.

Entropy (Basel). 2024 Sep 16;26(9):794. doi: 10.3390/e26090794.

The spike-and-slab quantile LASSO for robust variable selection in cancer genomics studies.

Stat Med. 2024 Nov 20;43(26):4928-4983. doi: 10.1002/sim.10196. Epub 2024 Sep 11.

Integrating DNA methylation and gene expression data in a single gene network using the iNETgrate package.

Sci Rep. 2023 Dec 8;13(1):21721. doi: 10.1038/s41598-023-48237-8.

Construction and analysis of sample-specific driver modules for breast cancer.

BMC Genomics. 2022 Oct 20;23(1):717. doi: 10.1186/s12864-022-08928-4.

Identification of aberrantly methylated differentially expressed genes and pro-tumorigenic role of KIF2C in melanoma.

Front Genet. 2022 Jul 22;13:817656. doi: 10.3389/fgene.2022.817656. eCollection 2022.

Integrating Multi-Omics Data for Gene-Environment Interactions.

BioTech (Basel). 2021 Jan 29;10(1):3. doi: 10.3390/biotech10010003.

Sparse group variable selection for gene-environment interactions in the longitudinal study.

Genet Epidemiol. 2022 Jul;46(5-6):317-340. doi: 10.1002/gepi.22461. Epub 2022 Jun 29.

Heterogeneous data integration methods for patient similarity networks.

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac207.

本文引用的文献

Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis.

Genet Epidemiol. 2017 Jan;41(1):70-80. doi: 10.1002/gepi.22018. Epub 2016 Nov 10.

Integrating multidimensional omics data for cancer outcome.

Biostatistics. 2016 Oct;17(4):605-18. doi: 10.1093/biostatistics/kxw010. Epub 2016 Mar 14.

Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data.

Cancer Inform. 2015 Nov 15;13(Suppl 7):123-31. doi: 10.4137/CIN.S16353. eCollection 2015.

Deciphering the associations between gene expression and copy number alteration using a sparse double Laplacian shrinkage approach.

Bioinformatics. 2015 Dec 15;31(24):3977-83. doi: 10.1093/bioinformatics/btv518. Epub 2015 Sep 3.

ADAPTIVE ROBUST VARIABLE SELECTION.

Ann Stat. 2014 Feb 1;42(1):324-351. doi: 10.1214/13-AOS1191.

A selective review of robust variable selection with applications in bioinformatics.

Brief Bioinform. 2015 Sep;16(5):873-83. doi: 10.1093/bib/bbu046. Epub 2014 Dec 5.

Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer.

Ann Appl Stat. 2010 Mar;4(1):53-77. doi: 10.1214/09-AOAS271SUPP.

Integrative analysis of high-throughput cancer studies with contrasted penalization.

Genet Epidemiol. 2014 Feb;38(2):144-51. doi: 10.1002/gepi.21781. Epub 2014 Jan 6.

Incorporating network structure in integrative analysis of cancer prognosis data.

Genet Epidemiol. 2013 Feb;37(2):173-83. doi: 10.1002/gepi.21697. Epub 2012 Nov 17.

Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension.

J Am Stat Assoc. 2012 Mar 1;107(497):214-222. doi: 10.1080/01621459.2012.656014. Epub 2012 Jun 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于网络的对（表观）遗传测量之间关联的稳健分析。

Robust network-based analysis of the associations between (epi)genetic measurements.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献