• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对异质性和稀疏数据的组织感知RNA测序处理与标准化

Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.

作者信息

Paulson Joseph N, Chen Cho-Yi, Lopes-Ramos Camila M, Kuijjer Marieke L, Platig John, Sonawane Abhijeet R, Fagny Maud, Glass Kimberly, Quackenbush John

机构信息

Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.

Department of Biostatistics, Harvard School of Public Health, Boston, MA, 02215, USA.

出版信息

BMC Bioinformatics. 2017 Oct 3;18(1):437. doi: 10.1186/s12859-017-1847-x.

DOI:10.1186/s12859-017-1847-x
PMID:28974199
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5627434/
Abstract

BACKGROUND

Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data - critical first steps for any subsequent analysis.

RESULTS

We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project.

CONCLUSIONS

An R package instantiating YARN is available at http://bioconductor.org/packages/yarn .

摘要

背景

尽管超高通量RNA测序已成为全基因组转录谱分析的主导技术,但绝大多数RNA测序研究通常仅对数十个样本进行分析,并且大多数分析流程都是针对这些较小规模的研究进行优化的。然而,现在的项目正在生成越来越大的数据集,这些数据集包含来自数百或数千个样本的RNA测序数据,这些样本通常是在多个中心收集的,且来自不同的组织。由于批次和组织效应,这些复杂的数据集带来了重大的分析挑战,但也提供了重新审视我们用于预处理、标准化和过滤RNA测序数据的假设和方法的机会——这是任何后续分析的关键第一步。

结果

我们发现,对大型RNA测序数据集进行分析既需要仔细的质量控制,也需要考虑多组研究中固有的异质性所导致的稀疏性。我们开发了另一种RNA标准化软件流程(YARN),它包括质量控制和预处理、基因过滤以及标准化步骤,旨在促进对大型、异质性RNA测序数据集的下游分析,并且我们展示了其在基因型-组织表达(GTEx)项目数据中的应用。

结论

可通过http://bioconductor.org/packages/yarn获取实例化YARN的R包。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/ed5d90e2ed31/12859_2017_1847_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/6d3beef39edc/12859_2017_1847_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/0bf33d9bb140/12859_2017_1847_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/67ecc9e934e8/12859_2017_1847_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/ed5d90e2ed31/12859_2017_1847_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/6d3beef39edc/12859_2017_1847_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/0bf33d9bb140/12859_2017_1847_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/67ecc9e934e8/12859_2017_1847_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f57/5627434/ed5d90e2ed31/12859_2017_1847_Fig4_HTML.jpg

相似文献

1
Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.针对异质性和稀疏数据的组织感知RNA测序处理与标准化
BMC Bioinformatics. 2017 Oct 3;18(1):437. doi: 10.1186/s12859-017-1847-x.
2
Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.Scater:R语言中单细胞RNA测序数据的预处理、质量控制、标准化和可视化
Bioinformatics. 2017 Apr 15;33(8):1179-1186. doi: 10.1093/bioinformatics/btw777.
3
Normalization of Single-Cell RNA-Seq Data.单细胞 RNA-Seq 数据的归一化处理。
Methods Mol Biol. 2021;2284:303-329. doi: 10.1007/978-1-0716-1307-8_17.
4
scPipe: A flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data.scPipe:用于单细胞 RNA 测序数据的灵活 R/Bioconductor 预处理流水线。
PLoS Comput Biol. 2018 Aug 10;14(8):e1006361. doi: 10.1371/journal.pcbi.1006361. eCollection 2018 Aug.
5
RNA-Seq optimization with eQTL gold standards.利用 eQTL 金标准进行 RNA-Seq 优化。
BMC Genomics. 2013 Dec 17;14:892. doi: 10.1186/1471-2164-14-892.
6
Flexible expressed region analysis for RNA-seq with derfinder.使用derfinder对RNA测序进行灵活的表达区域分析。
Nucleic Acids Res. 2017 Jan 25;45(2):e9. doi: 10.1093/nar/gkw852. Epub 2016 Sep 29.
7
A Zipf-plot based normalization method for high-throughput RNA-seq data.基于 Zipf 分布的高通量 RNA-seq 数据标准化方法。
PLoS One. 2020 Apr 9;15(4):e0230594. doi: 10.1371/journal.pone.0230594. eCollection 2020.
8
scruff: an R/Bioconductor package for preprocessing single-cell RNA-sequencing data.scruff:一个用于预处理单细胞 RNA-seq 数据的 R/Bioconductor 包。
BMC Bioinformatics. 2019 May 2;20(1):222. doi: 10.1186/s12859-019-2797-2.
9
pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components.pcaExplorer:一个用于与 RNA-seq 主成分交互的 R/Bioconductor 包。
BMC Bioinformatics. 2019 Jun 13;20(1):331. doi: 10.1186/s12859-019-2879-1.
10
IBRAP: integrated benchmarking single-cell RNA-sequencing analytical pipeline.IBRAP:集成基准单细胞 RNA-seq 分析管道。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad061.

引用本文的文献

1
Telomeric repeat-containing RNA increases in aged human cells.含端粒重复序列的RNA在衰老的人类细胞中增加。
Nucleic Acids Res. 2025 Jul 8;53(13). doi: 10.1093/nar/gkaf597.
2
CoGTEx: Unscaled system-level coexpression estimation from GTEx data forecast novel functional gene partners.CoGTEx:从 GTEx 数据预测新的功能基因伙伴的无标度系统水平共表达估计。
PLoS One. 2024 Oct 4;19(10):e0309961. doi: 10.1371/journal.pone.0309961. eCollection 2024.
3
Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets.

本文引用的文献

1
Sex Differences in Gene Expression and Regulatory Networks across 29 Human Tissues.29 个人体组织中的基因表达和调控网络的性别差异。
Cell Rep. 2020 Jun 23;31(12):107795. doi: 10.1016/j.celrep.2020.107795.
2
Understanding Tissue-Specific Gene Regulation.理解组织特异性基因调控。
Cell Rep. 2017 Oct 24;21(4):1077-1088. doi: 10.1016/j.celrep.2017.10.001.
3
Smooth quantile normalization.平滑分位数归一化
用于解决单细胞转录组数据集分析中挑战的数据标准化。
BMC Genomics. 2024 May 6;25(1):444. doi: 10.1186/s12864-024-10364-5.
4
Sex-biased gene expression and gene-regulatory networks of sex-biased adverse event drug targets and drug metabolism genes.性别偏向的药物作用靶点和药物代谢基因的性别偏向表达和基因调控网络。
BMC Pharmacol Toxicol. 2024 Jan 2;25(1):5. doi: 10.1186/s40360-023-00727-1.
5
Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data.调整 RNA 测序数据中基因表达测量的虚假相关性。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad610.
6
Sex-biased gene expression and gene-regulatory networks of sex-biased adverse event drug targets and drug metabolism genes.性别偏向性基因表达以及性别偏向性不良事件药物靶点和药物代谢基因的基因调控网络。
bioRxiv. 2023 Nov 15:2023.05.23.541950. doi: 10.1101/2023.05.23.541950.
7
Integrative profiling of gene expression and chromatin accessibility elucidates specific transcriptional networks in porcine neutrophils.基因表达与染色质可及性的综合分析揭示了猪中性粒细胞中的特定转录网络。
Front Genet. 2023 May 23;14:1107462. doi: 10.3389/fgene.2023.1107462. eCollection 2023.
8
The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks.网络动物园:用于推断和分析基因调控网络的多语言包。
Genome Biol. 2023 Mar 9;24(1):45. doi: 10.1186/s13059-023-02877-1.
9
HGCA2.0: An RNA-Seq Based Webtool for Gene Coexpression Analysis in .HGCA2.0:基于 RNA-Seq 的. 基因共表达分析网络工具
Cells. 2023 Jan 21;12(3):388. doi: 10.3390/cells12030388.
10
Network analysis reveals rare disease signatures across multiple levels of biological organization.网络分析揭示了多个生物学组织层次上的罕见疾病特征。
Nat Commun. 2021 Nov 9;12(1):6306. doi: 10.1038/s41467-021-26674-1.
Biostatistics. 2018 Apr 1;19(2):185-198. doi: 10.1093/biostatistics/kxx028.
4
Regulatory network changes between cell lines and their tissues of origin.细胞系与其起源组织之间的调控网络变化。
BMC Genomics. 2017 Sep 12;18(1):723. doi: 10.1186/s12864-017-4111-x.
5
Exploring regulation in tissues with eQTL networks.探索具有 eQTL 网络的组织中的调控。
Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):E7841-E7850. doi: 10.1073/pnas.1707375114. Epub 2017 Aug 29.
6
Estimating gene regulatory networks with pandaR.使用熊猫 R 估算基因调控网络。
Bioinformatics. 2017 Jul 15;33(14):2232-2234. doi: 10.1093/bioinformatics/btx139.
7
Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies.这到底是谁的样本?转录组学研究中样本的广泛错误注释。
F1000Res. 2016 Aug 30;5:2103. doi: 10.12688/f1000research.9471.2. eCollection 2016.
8
Evaluating the Effect of Cell Culture on Gene Expression in Primary Tissue Samples Using Microfluidic-Based Single Cell Transcriptional Analysis.使用基于微流控的单细胞转录分析评估细胞培养对原代组织样本中基因表达的影响。
Microarrays (Basel). 2015 Nov 4;4(4):540-50. doi: 10.3390/microarrays4040540.
9
A survey of best practices for RNA-seq data analysis.RNA测序数据分析的最佳实践调查。
Genome Biol. 2016 Jan 26;17:13. doi: 10.1186/s13059-016-0881-8.
10
Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.人类基因组学。基因型-组织表达(GTEx)试点分析:人类多组织基因调控
Science. 2015 May 8;348(6235):648-60. doi: 10.1126/science.1262110. Epub 2015 May 7.