• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基因集的利用,提高 RNA-Seq 和微阵列数据的可比性。

Increased comparability between RNA-Seq and microarray data by utilization of gene sets.

机构信息

Swammerdam Institute for Life Sciences, University of Amsterdam.

出版信息

PLoS Comput Biol. 2020 Sep 30;16(9):e1008295. doi: 10.1371/journal.pcbi.1008295. eCollection 2020 Sep.

DOI:10.1371/journal.pcbi.1008295
PMID:32997685
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7549825/
Abstract

The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.

摘要

转录组学领域使用和测量 mRNA 作为基因表达的替代物。目前有两种主要的平台用于定量 mRNA,即微阵列和 RNA-Seq。许多比较研究表明,它们的结果并不总是一致的。在这项研究中,我们旨在找到一种稳健的方法来提高这两个平台的可比性,从而能够对来自这两个平台的合并数据进行数据分析。我们通过为所有样本的基因集集合计算富集分数,将来自两个不同平台的高维转录组学数据转换为低维的、具有生物学意义的数据集。我们基于原始数据和富集分数比较了两个平台的数据之间的相似性。我们表明,所进行的数据转换以生物学上相关的方式对数据进行了转换,并过滤掉了导致平台一致性增加的噪声。我们使用基于微阵列的富集分数构建的预测模型来验证该过程,该模型使用基于测序数据的富集分数来预测乳腺癌亚型。虽然微阵列和 RNA-Seq 的表达水平可能看起来不同,但将它们转化为具有生物学意义的基因集富集分数可以显著提高它们的相关性,这是整合这两个平台数据的一个重要步骤。基因集集合被证明包含具有生物学意义的基因集。建议未来的研究更深入地研究用于转换的基因集的组成、大小和数量对其效果的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/e0f6fba1546d/pcbi.1008295.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/224d7be56ead/pcbi.1008295.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/f21918c720b8/pcbi.1008295.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/330cc90fc8dd/pcbi.1008295.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/90765fd5e905/pcbi.1008295.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/048e88f4d57f/pcbi.1008295.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/299a0e82ec42/pcbi.1008295.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/d286b03f3b7c/pcbi.1008295.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/24da3e9f37a3/pcbi.1008295.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/e0f6fba1546d/pcbi.1008295.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/224d7be56ead/pcbi.1008295.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/f21918c720b8/pcbi.1008295.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/330cc90fc8dd/pcbi.1008295.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/90765fd5e905/pcbi.1008295.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/048e88f4d57f/pcbi.1008295.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/299a0e82ec42/pcbi.1008295.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/d286b03f3b7c/pcbi.1008295.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/24da3e9f37a3/pcbi.1008295.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06cf/7549825/e0f6fba1546d/pcbi.1008295.g009.jpg

相似文献

1
Increased comparability between RNA-Seq and microarray data by utilization of gene sets.通过基因集的利用,提高 RNA-Seq 和微阵列数据的可比性。
PLoS Comput Biol. 2020 Sep 30;16(9):e1008295. doi: 10.1371/journal.pcbi.1008295. eCollection 2020 Sep.
2
Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.整合RNA测序数据与异质性微阵列数据用于乳腺癌分析。
BMC Bioinformatics. 2017 Nov 21;18(1):506. doi: 10.1186/s12859-017-1925-0.
3
Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability.用于提高微阵列可比性的RNA测序数据的探针区域表达估计
PLoS One. 2015 May 12;10(5):e0126545. doi: 10.1371/journal.pone.0126545. eCollection 2015.
4
Novel and simple transformation algorithm for combining microarray data sets.用于合并微阵列数据集的新颖且简单的转换算法。
BMC Bioinformatics. 2007 Jun 25;8:218. doi: 10.1186/1471-2105-8-218.
5
Using microarray-based subtyping methods for breast cancer in the era of high-throughput RNA sequencing.在高通量 RNA 测序时代使用基于微阵列的乳腺癌亚型分类方法。
Mol Oncol. 2018 Dec;12(12):2136-2146. doi: 10.1002/1878-0261.12389. Epub 2018 Oct 29.
6
Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes.癌症微阵列数据的跨平台分析改进了基于基因表达的表型分类。
BMC Bioinformatics. 2005 Nov 4;6:265. doi: 10.1186/1471-2105-6-265.
7
Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells.RNA-Seq 和微阵列在激活 T 细胞转录组谱分析中的比较。
PLoS One. 2014 Jan 16;9(1):e78644. doi: 10.1371/journal.pone.0078644. eCollection 2014.
8
Rapid Transient Transcriptional Adaptation to Hypergravity in Jurkat T Cells Revealed by Comparative Analysis of Microarray and RNA-Seq Data.高速瞬时转录适应 Jurkat T 细胞超重力的比较分析揭示了微阵列和 RNA-Seq 数据。
Int J Mol Sci. 2021 Aug 6;22(16):8451. doi: 10.3390/ijms22168451.
9
Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates.利用小样本重复改进RNA测序数据的基因集富集分析
PLoS One. 2016 Nov 9;11(11):e0165919. doi: 10.1371/journal.pone.0165919. eCollection 2016.
10
Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.特征特异性分位数归一化可使用基因表达数据对分子亚型进行跨平台分类。
Bioinformatics. 2018 Jun 1;34(11):1868-1874. doi: 10.1093/bioinformatics/bty026.

引用本文的文献

1
The Role of Microarray in Modern Sequencing: Statistical Approach Matters in a Comparison Between Microarray and RNA-Seq.微阵列在现代测序中的作用:微阵列与RNA测序比较时统计方法很重要。
BioTech (Basel). 2025 Jul 5;14(3):55. doi: 10.3390/biotech14030055.
2
Integrated Computational Analysis Reveals Early Genetic and Epigenetic AML Susceptibility Biomarkers in Benzene-Exposed Workers.综合计算分析揭示苯暴露工人早期遗传和表观遗传急性髓系白血病易感性生物标志物。
Int J Mol Sci. 2025 Jan 28;26(3):1138. doi: 10.3390/ijms26031138.
3
DEG (differentially expressed gene) or not DEG that is the question: Should we compare between datasets or not?

本文引用的文献

1
Comparison of RNA-Seq and Microarray Gene Expression Platforms for the Toxicogenomic Evaluation of Liver From Short-Term Rat Toxicity Studies.用于短期大鼠毒性研究肝脏毒理基因组学评估的RNA测序和微阵列基因表达平台比较
Front Genet. 2019 Jan 22;9:636. doi: 10.3389/fgene.2018.00636. eCollection 2018.
2
Cross-platform normalization of microarray and RNA-seq data for machine learning applications.用于机器学习应用的微阵列和RNA测序数据的跨平台归一化。
PeerJ. 2016 Jan 21;4:e1621. doi: 10.7717/peerj.1621. eCollection 2016.
3
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
差异表达基因(DEG)与否,这才是问题所在:我们是否应该在数据集之间进行比较?
J Mol Cell Cardiol Plus. 2022 Dec 23;3:100029. doi: 10.1016/j.jmccpl.2022.100029. eCollection 2023 Mar.
4
Deciphering early responsive signature genes in rice blast disease: an integrated temporal transcriptomic study.解析水稻稻瘟病早期响应特征基因:一个综合的时间转录组学研究。
J Appl Genet. 2024 Dec;65(4):665-681. doi: 10.1007/s13353-024-00901-z. Epub 2024 Aug 24.
5
Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer.基于转录组的基因集特征化在机器学习模型预测转移性癌症起源中的应用。
Curr Issues Mol Biol. 2024 Jul 9;46(7):7291-7302. doi: 10.3390/cimb46070432.
6
Protocol for performing metabolic pathway-based subtyping of breast tumors.基于代谢途径的乳腺癌分型方案。
STAR Protoc. 2024 Sep 20;5(3):103173. doi: 10.1016/j.xpro.2024.103173. Epub 2024 Jul 4.
7
Omic horizon expression: a database of gene expression based on RNA sequencing data.Omic 地平线表达:基于 RNA 测序数据的基因表达数据库。
BMC Genomics. 2023 Nov 8;24(1):674. doi: 10.1186/s12864-023-09781-9.
8
Data Mining of Microarray Datasets in Translational Neuroscience.转化神经科学中微阵列数据集的数据挖掘
Brain Sci. 2023 Sep 14;13(9):1318. doi: 10.3390/brainsci13091318.
9
COVPRIG robustly predicts the overall survival of IDH wild-type glioblastoma and highlights METTL1 neural-progenitor-like tumor cell in driving unfavorable outcome.COVPRIG 能稳健地预测 IDH 野生型脑胶质瘤的总生存期,并突出 METTL1 作为神经祖细胞样肿瘤细胞驱动不良预后的作用。
J Transl Med. 2023 Aug 8;21(1):533. doi: 10.1186/s12967-023-04382-2.
10
Fucose as a potential therapeutic molecule against the immune-mediated inflammation in IgA nepharopathy: An unrevealed link.岩藻糖作为 IgA 肾病免疫介导炎症的潜在治疗分子:一个未被揭示的联系。
Front Immunol. 2022 Aug 17;13:929138. doi: 10.3389/fimmu.2022.929138. eCollection 2022.
分子特征数据库(MSigDB)标志性基因集集合。
Cell Syst. 2015 Dec 23;1(6):417-425. doi: 10.1016/j.cels.2015.12.004.
4
Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells.RNA-Seq 和微阵列在激活 T 细胞转录组谱分析中的比较。
PLoS One. 2014 Jan 16;9(1):e78644. doi: 10.1371/journal.pone.0078644. eCollection 2014.
5
GSVA: gene set variation analysis for microarray and RNA-seq data.GSVA:用于微阵列和 RNA-seq 数据的基因集变异分析。
BMC Bioinformatics. 2013 Jan 16;14:7. doi: 10.1186/1471-2105-14-7.
6
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.癌症细胞系百科全书使对抗癌药物敏感性的预测建模成为可能。
Nature. 2012 Mar 28;483(7391):603-7. doi: 10.1038/nature11003.
7
Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1.系统性RNA干扰显示,致癌性KRAS驱动的癌症需要TBK1。
Nature. 2009 Nov 5;462(7269):108-12. doi: 10.1038/nature08460. Epub 2009 Oct 21.
8
Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt.使用R/Bioconductor软件包biomaRt整合基因组数据集的映射标识符。
Nat Protoc. 2009;4(8):1184-91. doi: 10.1038/nprot.2009.97. Epub 2009 Jul 23.
9
Estimating accuracy of RNA-Seq and microarrays with proteomics.利用蛋白质组学评估RNA测序和微阵列的准确性。
BMC Genomics. 2009 Apr 16;10:161. doi: 10.1186/1471-2164-10-161.
10
Matrix correlations for high-dimensional data: the modified RV-coefficient.高维数据的矩阵相关性:修正的RV系数。
Bioinformatics. 2009 Feb 1;25(3):401-5. doi: 10.1093/bioinformatics/btn634. Epub 2008 Dec 10.