Suppr超能文献

利用 eQTL 金标准进行 RNA-Seq 优化。

RNA-Seq optimization with eQTL gold standards.

机构信息

McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, USA.

出版信息

BMC Genomics. 2013 Dec 17;14:892. doi: 10.1186/1471-2164-14-892.

Abstract

BACKGROUND

RNA-Sequencing (RNA-Seq) experiments have been optimized for library preparation, mapping, and gene expression estimation. These methods, however, have revealed weaknesses in the next stages of analysis of differential expression, with results sensitive to systematic sample stratification or, in more extreme cases, to outliers. Further, a method to assess normalization and adjustment measures imposed on the data is lacking.

RESULTS

To address these issues, we utilize previously published eQTLs as a novel gold standard at the center of a framework that integrates DNA genotypes and RNA-Seq data to optimize analysis and aid in the understanding of genetic variation and gene expression. After detecting sample contamination and sequencing outliers in RNA-Seq data, a set of previously published brain eQTLs was used to determine if sample outlier removal was appropriate. Improved replication of known eQTLs supported removal of these samples in downstream analyses. eQTL replication was further employed to assess normalization methods, covariate inclusion, and gene annotation. This method was validated in an independent RNA-Seq blood data set from the GTEx project and a tissue-appropriate set of eQTLs. eQTL replication in both data sets highlights the necessity of accounting for unknown covariates in RNA-Seq data analysis.

CONCLUSION

As each RNA-Seq experiment is unique with its own experiment-specific limitations, we offer an easily-implementable method that uses the replication of known eQTLs to guide each step in one's data analysis pipeline. In the two data sets presented herein, we highlight not only the necessity of careful outlier detection but also the need to account for unknown covariates in RNA-Seq experiments.

摘要

背景

RNA 测序 (RNA-Seq) 实验已经针对文库制备、映射和基因表达估计进行了优化。然而,这些方法在差异表达分析的下一阶段揭示了一些弱点,其结果对系统的样本分层敏感,在更极端的情况下,对离群值敏感。此外,还缺乏一种评估数据上实施的归一化和调整措施的方法。

结果

为了解决这些问题,我们利用先前发表的 eQTL 作为一个新的黄金标准,作为一个整合 DNA 基因型和 RNA-Seq 数据的框架的核心,以优化分析并帮助理解遗传变异和基因表达。在检测到 RNA-Seq 数据中的样本污染和测序离群值后,使用先前发表的一组大脑 eQTL 来确定是否适当去除样本离群值。已知 eQTL 的复制支持在下游分析中去除这些样本。eQTL 的复制进一步用于评估归一化方法、协变量的包含和基因注释。该方法在 GTEx 项目的独立 RNA-Seq 血液数据集和组织适当的 eQTL 集中得到了验证。在这两个数据集的 eQTL 复制中,强调了在 RNA-Seq 数据分析中必须考虑未知协变量的必要性。

结论

由于每个 RNA-Seq 实验都是独特的,具有自己的实验特定的限制,我们提供了一种易于实现的方法,该方法使用已知 eQTL 的复制来指导数据分析管道中的每一步。在本文提出的两个数据集,我们不仅强调了仔细检测离群值的必要性,还强调了在 RNA-Seq 实验中必须考虑未知协变量的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eaa/3890578/b42d1f00f647/1471-2164-14-892-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验