Suppr超能文献

高维到高维筛选,以检测全基因组表观遗传和非编码 RNA 对基因表达的调控作用。

High-dimension to high-dimension screening for detecting genome-wide epigenetic and noncoding RNA regulators of gene expression.

机构信息

Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, USA.

Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA.

出版信息

Bioinformatics. 2022 Sep 2;38(17):4078-4087. doi: 10.1093/bioinformatics/btac518.

Abstract

MOTIVATION

The advancement of high-throughput technology characterizes a wide variety of epigenetic modifications and noncoding RNAs across the genome involved in disease pathogenesis via regulating gene expression. The high dimensionality of both epigenetic/noncoding RNA and gene expression data make it challenging to identify the important regulators of genes. Conducting univariate test for each possible regulator-gene pair is subject to serious multiple comparison burden, and direct application of regularization methods to select regulator-gene pairs is computationally infeasible. Applying fast screening to reduce dimension first before regularization is more efficient and stable than applying regularization methods alone.

RESULTS

We propose a novel screening method based on robust partial correlation to detect epigenetic and noncoding RNA regulators of gene expression over the whole genome, a problem that includes both high-dimensional predictors and high-dimensional responses. Compared to existing screening methods, our method is conceptually innovative that it reduces the dimension of both predictor and response, and screens at both node (regulators or genes) and edge (regulator-gene pairs) levels. We develop data-driven procedures to determine the conditional sets and the optimal screening threshold, and implement a fast iterative algorithm. Simulations and applications to long noncoding RNA and microRNA regulation in Kidney cancer and DNA methylation regulation in Glioblastoma Multiforme illustrate the validity and advantage of our method.

AVAILABILITY AND IMPLEMENTATION

The R package, related source codes and real datasets used in this article are provided at https://github.com/kehongjie/rPCor.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量技术的进步描述了广泛的基因组中的各种表观遗传修饰和非编码 RNA,它们通过调节基因表达参与疾病的发病机制。表观遗传/非编码 RNA 和基因表达数据的高维性使得识别基因的重要调控因子具有挑战性。对每个可能的调控基因对进行单变量检验会受到严重的多重比较负担的影响,而直接应用正则化方法选择调控基因对在计算上是不可行的。在正则化之前应用快速筛选来降低维度比单独应用正则化方法更有效和稳定。

结果

我们提出了一种基于稳健偏相关的新筛选方法,用于检测整个基因组中基因表达的表观遗传和非编码 RNA 调控因子,这是一个包含高维预测因子和高维响应的问题。与现有的筛选方法相比,我们的方法在概念上具有创新性,它降低了预测因子和响应的维度,并在节点(调控因子或基因)和边缘(调控因子-基因对)水平上进行筛选。我们开发了数据驱动的程序来确定条件集和最优筛选阈值,并实现了快速迭代算法。模拟和对肾细胞癌中长非编码 RNA 和 microRNA 调节以及胶质母细胞瘤多形性中 DNA 甲基化调节的应用说明了我们方法的有效性和优势。

可用性和实现

本文中使用的 R 包、相关源代码和真实数据集可在 https://github.com/kehongjie/rPCor 上获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

2
decorate: differential epigenetic correlation test.修饰:差异表观遗传关联测试。
Bioinformatics. 2020 May 1;36(9):2856-2861. doi: 10.1093/bioinformatics/btaa067.

本文引用的文献

1
Gene regulation by long non-coding RNAs and its biological functions.长非编码 RNA 的基因调控及其生物学功能。
Nat Rev Mol Cell Biol. 2021 Feb;22(2):96-118. doi: 10.1038/s41580-020-00315-9. Epub 2020 Dec 22.
5
Covariance-Insured Screening.协方差保险筛查
Comput Stat Data Anal. 2019 Apr;132:100-114. doi: 10.1016/j.csda.2018.09.001. Epub 2018 Sep 22.
8
The Human Transcription Factors.人类转录因子。
Cell. 2018 Feb 8;172(4):650-665. doi: 10.1016/j.cell.2018.01.029.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验