• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因表达实验设计矩阵创建指南。

A guide to creating design matrices for gene expression experiments.

机构信息

The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.

Department of Medical Biology, The University of Melbourne, Parkville, 3010, Australia.

出版信息

F1000Res. 2020 Dec 10;9:1444. doi: 10.12688/f1000research.27893.1. eCollection 2020.

DOI:10.12688/f1000research.27893.1
PMID:33604029
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7873980/
Abstract

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a -style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.

摘要

基因数据类型(如 RNA 测序实验)的差异表达分析使用线性模型来确定基因表达变化的大小和方向。对于 RNA 测序,有几个为此目的而建立的软件包,同时还有描述良好的分析管道。然而,分析过程中有两个关键步骤可能会让许多人感到困惑——通过设计矩阵设置适当的模型,以及通过对比矩阵设置感兴趣的比较。这些步骤特别麻烦,因为目前没有设计和对比矩阵的广泛目录。人们通常会在不同的平台上搜索示例案例研究,并混合和匹配这些来源的建议,以适应他们手头现有的数据集。本文通过提供每个案例研究的代码和图形表示,引导读者了解如何设置设计和对比矩阵。我们采用实用的方法,从更简单的示例(例如具有单个解释变量的模型)开始,逐步过渡到更复杂的示例(例如交互模型、混合效应模型、高阶时间序列和周期性模型)。尽管我们的工作是专门针对特定的 -style 管道编写的,但其中的大部分内容也适用于其他差异表达分析软件包,并且涵盖的思路可以适用于其他高通量技术的数据分析。在适当的情况下,我们解释了模型之间的解释和差异,以帮助读者在自己的模型选择中做出决策。在可能的情况下,我们省略了不必要的行话和理论,以便我们的工作能够被广泛的读者群体所理解,包括初学者和有基因组数据分析经验的人。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/2b5e54c57311/f1000research-9-30844-g0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/c86fab854e14/f1000research-9-30844-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/0b95d79e1a6d/f1000research-9-30844-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/1e00a3cd89e9/f1000research-9-30844-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/4380596709b5/f1000research-9-30844-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/f9b502886b78/f1000research-9-30844-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/73b02719dc36/f1000research-9-30844-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/db5d330f677d/f1000research-9-30844-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/b03e537c7712/f1000research-9-30844-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/ba61e58c1d30/f1000research-9-30844-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/49e136c20b83/f1000research-9-30844-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/0730d7573d66/f1000research-9-30844-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/4dc4a59f2c4b/f1000research-9-30844-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/0edb393ebbeb/f1000research-9-30844-g0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/f250f58c9879/f1000research-9-30844-g0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/3bbcc2161344/f1000research-9-30844-g0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/439fb486c1b7/f1000research-9-30844-g0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/3067f675ba76/f1000research-9-30844-g0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/4599e9f7f6b2/f1000research-9-30844-g0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/df19b87d3f5b/f1000research-9-30844-g0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/2b5e54c57311/f1000research-9-30844-g0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/c86fab854e14/f1000research-9-30844-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/0b95d79e1a6d/f1000research-9-30844-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/1e00a3cd89e9/f1000research-9-30844-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/4380596709b5/f1000research-9-30844-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/f9b502886b78/f1000research-9-30844-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/73b02719dc36/f1000research-9-30844-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/db5d330f677d/f1000research-9-30844-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/b03e537c7712/f1000research-9-30844-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/ba61e58c1d30/f1000research-9-30844-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/49e136c20b83/f1000research-9-30844-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/0730d7573d66/f1000research-9-30844-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/4dc4a59f2c4b/f1000research-9-30844-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/0edb393ebbeb/f1000research-9-30844-g0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/f250f58c9879/f1000research-9-30844-g0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/3bbcc2161344/f1000research-9-30844-g0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/439fb486c1b7/f1000research-9-30844-g0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/3067f675ba76/f1000research-9-30844-g0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/4599e9f7f6b2/f1000research-9-30844-g0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/df19b87d3f5b/f1000research-9-30844-g0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74eb/7873980/2b5e54c57311/f1000research-9-30844-g0019.jpg

相似文献

1
A guide to creating design matrices for gene expression experiments.基因表达实验设计矩阵创建指南。
F1000Res. 2020 Dec 10;9:1444. doi: 10.12688/f1000research.27893.1. eCollection 2020.
2
Enhancement of Plant Productivity in the Post-Genomics Era.后基因组时代植物生产力的提高
Curr Genomics. 2016 Aug;17(4):295-6. doi: 10.2174/138920291704160607182507.
3
It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.美味无比:使用edgeR中拟似然方法进行RNA测序实验差异表达分析的方法
Methods Mol Biol. 2016;1418:391-416. doi: 10.1007/978-1-4939-3578-9_19.
4
beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types.beachmat:一个用于从多种 R 矩阵类型访问高通量生物数据的 Bioconductor C++ API。
PLoS Comput Biol. 2018 May 3;14(5):e1006135. doi: 10.1371/journal.pcbi.1006135. eCollection 2018 May.
5
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
6
htsint: a Python library for sequencing pipelines that combines data through gene set generation.htsint:一个用于测序流程的Python库,通过基因集生成来整合数据。
BMC Bioinformatics. 2015 Sep 24;16:307. doi: 10.1186/s12859-015-0729-3.
7
8
General power and sample size calculations for high-dimensional genomic data.高维基因组数据的一般功效和样本量计算
Stat Appl Genet Mol Biol. 2013 Aug;12(4):449-67. doi: 10.1515/sagmb-2012-0046.
9
WASP: a versatile, web-accessible single cell RNA-Seq processing platform.WASP:一个多功能的、可通过网络访问的单细胞RNA测序处理平台。
BMC Genomics. 2021 Mar 18;22(1):195. doi: 10.1186/s12864-021-07469-6.
10
Introduction: Cancer Gene Networks.引言:癌症基因网络
Methods Mol Biol. 2017;1513:1-9. doi: 10.1007/978-1-4939-6539-7_1.

引用本文的文献

1
Evolutionary trajectories of IDH-mutant astrocytoma identify molecular grading markers related to cell cycling.异柠檬酸脱氢酶(IDH)突变型星形细胞瘤的进化轨迹确定了与细胞周期相关的分子分级标志物。
Nat Cancer. 2025 Aug 19. doi: 10.1038/s43018-025-01023-z.
2
Latent infection of Caenorhabditis elegans by Orsay virus induces age-dependent immunity and cross-protection.秀丽隐杆线虫被奥赛病毒潜伏感染会诱导年龄依赖性免疫和交叉保护。
Nat Commun. 2025 Aug 2;16(1):7123. doi: 10.1038/s41467-025-62522-2.
3
IsomiR stoichiometry changes as disease biomarkers.

本文引用的文献

1
ExploreModelMatrix: Interactive exploration for improved understanding of design matrices and linear models in R.ExploreModelMatrix:用于交互式探索的 R 包,可帮助理解设计矩阵和线性模型。
F1000Res. 2020 Jun 4;9:512. doi: 10.12688/f1000research.24187.2. eCollection 2020.
2
RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR.借助limma、Glimma和edgeR,RNA测序分析易如反掌。
F1000Res. 2016 Jun 17;5. doi: 10.12688/f1000research.9005.3. eCollection 2016.
3
Orchestrating high-throughput genomic analysis with Bioconductor.
作为疾病生物标志物的异源微小RNA(IsomiR)化学计量变化。
Mol Ther Nucleic Acids. 2025 May 24;36(3):102578. doi: 10.1016/j.omtn.2025.102578. eCollection 2025 Sep 9.
4
Diet Driven Differences in Host Tolerance Are Linked to Shifts in Global Gene Expression in a Common Avian Host-Pathogen System.饮食驱动的宿主耐受性差异与常见鸟类宿主-病原体系统中全球基因表达的变化有关。
Mol Ecol. 2025 Jun;34(12):e17793. doi: 10.1111/mec.17793. Epub 2025 May 12.
5
Multi-region brain transcriptomic analysis of amyotrophic lateral sclerosis reveals widespread RNA alterations and substantial cerebellum involvement.肌萎缩侧索硬化症的多区域脑转录组分析揭示了广泛的RNA改变和小脑的大量受累。
Mol Neurodegener. 2025 Apr 25;20(1):40. doi: 10.1186/s13024-025-00820-5.
6
Anti-inflammatory effects of tiotropium in COPD: a randomised double-blind trial.噻托溴铵对慢性阻塞性肺疾病的抗炎作用:一项随机双盲试验
ERJ Open Res. 2025 Mar 31;11(2). doi: 10.1183/23120541.00735-2024. eCollection 2025 Mar.
7
Defined Diets Link Iron and α-Linolenic Acid to Cyp1b1 Regulation of Neonatal Liver Development Through Srebp Forms and LncRNA H19.特定饮食通过固醇调节元件结合蛋白形式和长链非编码RNA H19将铁和α-亚麻酸与Cyp1b1对新生儿肝脏发育的调节联系起来。
Int J Mol Sci. 2025 Feb 25;26(5):2011. doi: 10.3390/ijms26052011.
8
Multi-Omics Analysis in Mouse Primary Cortical Neurons Reveals Complex Positive and Negative Biological Interactions Between Constituent Compounds of .小鼠原代皮质神经元中的多组学分析揭示了……组成化合物之间复杂的正负生物学相互作用。 (原文句末不完整,翻译时根据已有内容尽量准确表述)
Pharmaceuticals (Basel). 2024 Dec 27;18(1):19. doi: 10.3390/ph18010019.
9
edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets.edgeR v4:具有扩展功能且对小计数和更大数据集提供更好支持的强大测序数据差异分析工具。
Nucleic Acids Res. 2025 Jan 11;53(2). doi: 10.1093/nar/gkaf018.
10
Analysis of multi-condition single-cell data with latent embedding multivariate regression.使用潜在嵌入多元回归分析多条件单细胞数据。
Nat Genet. 2025 Mar;57(3):659-667. doi: 10.1038/s41588-024-01996-0. Epub 2025 Jan 3.
使用Bioconductor编排高通量基因组分析。
Nat Methods. 2015 Feb;12(2):115-21. doi: 10.1038/nmeth.3252.
4
limma powers differential expression analyses for RNA-sequencing and microarray studies.limma为RNA测序和微阵列研究提供差异表达分析的动力。
Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.
5
Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.针对生物变异的多因素 RNA-Seq 实验的差异表达分析。
Nucleic Acids Res. 2012 May;40(10):4288-97. doi: 10.1093/nar/gks042. Epub 2012 Jan 28.
6
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.edgeR:一个用于数字基因表达数据差异表达分析的 Bioconductor 包。
Bioinformatics. 2010 Jan 1;26(1):139-40. doi: 10.1093/bioinformatics/btp616. Epub 2009 Nov 11.
7
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。
Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.
8
Factorial and time course designs for cDNA microarray experiments.用于cDNA微阵列实验的析因设计和时间进程设计。
Biostatistics. 2004 Jan;5(1):89-111. doi: 10.1093/biostatistics/5.1.89.