通过使用先验知识建模隐藏协变量对 RNA-seq 数据进行标准化。

Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.

机构信息

Department of Computer Science, Stanford University, Stanford, California, USA.

出版信息

PLoS One. 2013 Jul 18;8(7):e68141. doi: 10.1371/journal.pone.0068141. Print 2013.

DOI:10.1371/journal.pone.0068141

PMID:23874524

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3715474/

Abstract

Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.

摘要

转录组分析检测是广泛用于研究细胞过程中环境或遗传变异表现的方法。RNA 测序，特别是由于其能够检测整个转录组，包括新的转录事件，因此具有极大的改善这种理解的潜力。然而，与早期的表达分析检测一样，RNA 测序数据的分析需要仔细考虑可能会给表达测量带来系统的、混杂的可变性的因素，从而导致虚假相关性。在这里，我们考虑从 RNA 测序数据中建模和去除已知和隐藏的混杂因素影响的问题。我们描述了一个统一的残差框架，它包含了现有的方法，并且使用这个框架，提出了一种新的方法，HCP（带有先验的隐藏协变量）。HCP 使用了关于混杂因素的更明智的假设，并且在计算成本低得多的情况下，表现得与现有方法一样好或更好。我们的实验表明，使用适当的模型来考虑已知和隐藏因素可以提高 RNA 测序数据在两个非常不同的任务中的质量：检测与附近表达变化相关的遗传变异（cis-eQTLs），以及构建准确的共表达网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab04/3715474/408e674377ea/pone.0068141.g001.jpg

相似文献

Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.通过使用先验知识建模隐藏协变量对 RNA-seq 数据进行标准化。

PLoS One. 2013 Jul 18;8(7):e68141. doi: 10.1371/journal.pone.0068141. Print 2013.

Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments.稳健的统计建模可提高高通量 RNA 结构探测实验的灵敏度。

Nat Methods. 2017 Jan;14(1):83-89. doi: 10.1038/nmeth.4068. Epub 2016 Nov 7.

RNA-Seq: Improving Our Understanding of Retinal Biology and Disease.RNA测序：增进我们对视网膜生物学与疾病的理解

Cold Spring Harb Perspect Med. 2015 Feb 26;5(9):a017152. doi: 10.1101/cshperspect.a017152.

qSVA framework for RNA quality correction in differential expression analysis.qSVA 框架用于差异表达分析中的 RNA 质量校正。

Proc Natl Acad Sci U S A. 2017 Jul 3;114(27):7130-7135. doi: 10.1073/pnas.1617384114. Epub 2017 Jun 20.

Data Analysis in Single-Cell Transcriptome Sequencing.单细胞转录组测序中的数据分析

Methods Mol Biol. 2018;1754:311-326. doi: 10.1007/978-1-4939-7717-8_18.

Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools.RNA-seq reads 的差异表达分析：概述、分类和工具。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):566-586. doi: 10.1109/TCBB.2018.2873010. Epub 2018 Oct 1.

A powerful and flexible approach to the analysis of RNA sequence count data.一种强大而灵活的 RNA 序列计数数据分析方法。

Bioinformatics. 2011 Oct 1;27(19):2672-8. doi: 10.1093/bioinformatics/btr449. Epub 2011 Aug 2.

Group A Streptococcus Transcriptome Analysis.A 组链球菌转录组分析。

Methods Mol Biol. 2020;2136:113-133. doi: 10.1007/978-1-0716-0467-0_8.

Transcriptome Sequencing: RNA-Seq.转录组测序：RNA测序

Methods Mol Biol. 2018;1754:15-27. doi: 10.1007/978-1-4939-7717-8_2.

RNA-Seq optimization with eQTL gold standards.利用 eQTL 金标准进行 RNA-Seq 优化。

BMC Genomics. 2013 Dec 17;14:892. doi: 10.1186/1471-2164-14-892.

引用本文的文献

CorrAdjust unveils biologically relevant transcriptomic correlations by efficiently eliminating hidden confounders.CorrAdjust通过有效消除隐藏的混杂因素，揭示了生物学上相关的转录组相关性。

Nucleic Acids Res. 2025 May 22;53(10). doi: 10.1093/nar/gkaf444.

The Farm Animal Genotype-Tissue Expression (FarmGTEx) Project.农场动物基因型-组织表达（FarmGTEx）项目

Nat Genet. 2025 Apr;57(4):786-796. doi: 10.1038/s41588-025-02121-5. Epub 2025 Mar 17.

Genetic architecture of RNA editing, splicing and gene expression in schizophrenia.精神分裂症中RNA编辑、剪接和基因表达的遗传结构

Hum Mol Genet. 2025 Feb 1;34(3):277-290. doi: 10.1093/hmg/ddae172.

A brief guide to analyzing expression quantitative trait loci.表达数量性状位点分析简要指南。

Mol Cells. 2024 Nov;47(11):100139. doi: 10.1016/j.mocell.2024.100139. Epub 2024 Oct 22.

Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome.大规模整合的共表达分析揭示了酵母非经典翻译组的转录调控、进化和细胞意义。

Genome Biol. 2024 Jul 8;25(1):183. doi: 10.1186/s13059-024-03287-7.

Sex-dependent placental methylation quantitative trait loci provide insight into the prenatal origins of childhood onset traits and conditions.性别依赖性胎盘甲基化数量性状基因座为儿童期发病性状和疾病的产前起源提供了见解。

iScience. 2024 Jan 26;27(2):109047. doi: 10.1016/j.isci.2024.109047. eCollection 2024 Feb 16.

PICALO: principal interaction component analysis for the identification of discrete technical, cell-type, and environmental factors that mediate eQTLs.PICALO：用于鉴定介导 eQTL 的离散技术、细胞类型和环境因素的主要相互作用成分分析。

Genome Biol. 2024 Jan 22;25(1):29. doi: 10.1186/s13059-023-03151-0.

UNIFYING AND GENERALIZING METHODS FOR REMOVING UNWANTED VARIATION BASED ON NEGATIVE CONTROLS.基于阴性对照去除不必要变异的统一和通用方法

Stat Sin. 2021 Jul;31(3):1145-1166. doi: 10.5705/ss.202018.0345.

Isoform-level transcriptome-wide association uncovers genetic risk mechanisms for neuropsychiatric disorders in the human brain.在人脑中，异构体水平转录组全基因组关联揭示了神经精神疾病的遗传风险机制。

Nat Genet. 2023 Dec;55(12):2117-2128. doi: 10.1038/s41588-023-01560-2. Epub 2023 Nov 30.

Cross-ancestry, cell-type-informed atlas of gene, isoform, and splicing regulation in the developing human brain.发育中的人类大脑中基因、异构体和剪接调控的跨祖先、细胞类型信息图谱。

medRxiv. 2023 Mar 6:2023.03.03.23286706. doi: 10.1101/2023.03.03.23286706.

本文引用的文献

The Genotype-Tissue Expression (GTEx) project.基因型-组织表达 (GTEx) 项目。

Nat Genet. 2013 Jun;45(6):580-5. doi: 10.1038/ng.2653.

Patterns of cis regulatory variation in diverse human populations.不同人类群体中顺式调控变异的模式。

PLoS Genet. 2012;8(4):e1002639. doi: 10.1371/journal.pgen.1002639. Epub 2012 Apr 19.

Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies.联合建模混杂因素和主要遗传调控因子可提高遗传基因组学研究的准确性。

PLoS Comput Biol. 2012 Jan;8(1):e1002330. doi: 10.1371/journal.pcbi.1002330. Epub 2012 Jan 5.

Using control genes to correct for unwanted variation in microarray data.利用对照基因纠正微阵列数据中的非期望变异。

Biostatistics. 2012 Jul;13(3):539-52. doi: 10.1093/biostatistics/kxr034. Epub 2011 Nov 17.

Investigation of variation in gene expression profiling of human blood by extended principle component analysis.应用扩展主成分分析法对人血基因表达谱的变异性进行研究。

PLoS One. 2011;6(10):e26905. doi: 10.1371/journal.pone.0026905. Epub 2011 Oct 27.

Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA.跨基因表达数量性状位点（eQTLs）揭示了与复杂表型相关的独立遗传变异与中间基因相关，HLA 发挥主要作用。

PLoS Genet. 2011 Aug;7(8):e1002197. doi: 10.1371/journal.pgen.1002197. Epub 2011 Aug 4.

Rare and common regulatory variation in population-scale sequenced human genomes.人群规模测序的人类基因组中的罕见和常见调控变异。

PLoS Genet. 2011 Jul;7(7):e1002144. doi: 10.1371/journal.pgen.1002144. Epub 2011 Jul 21.

Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。

BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.

Sequencing technology does not eliminate biological variability.测序技术并不能消除生物变异性。

Nat Biotechnol. 2011 Jul 11;29(7):572-3. doi: 10.1038/nbt.1910.

Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneity.混合模型共表达：在考虑表达异质性的情况下计算基因共表达。

Bioinformatics. 2011 Jul 1;27(13):i288-94. doi: 10.1093/bioinformatics/btr221.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过使用先验知识建模隐藏协变量对 RNA-seq 数据进行标准化。

Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献