• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因集分析中 RNA-seq 数据的长度偏差校正。

Length bias correction for RNA-seq data in gene set analyses.

机构信息

Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA.

出版信息

Bioinformatics. 2011 Mar 1;27(5):662-9. doi: 10.1093/bioinformatics/btr005. Epub 2011 Jan 19.

DOI:10.1093/bioinformatics/btr005
PMID:21252076
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3042188/
Abstract

MOTIVATION

Next-generation sequencing technologies are being rapidly applied to quantifying transcripts (RNA-seq). However, due to the unique properties of the RNA-seq data, the differential expression of longer transcripts is more likely to be identified than that of shorter transcripts with the same effect size. This bias complicates the downstream gene set analysis (GSA) because the methods for GSA previously developed for microarray data are based on the assumption that genes with same effect size have equal probability (power) to be identified as significantly differentially expressed. Since transcript length is not related to gene expression, adjusting for such length dependency in GSA becomes necessary.

RESULTS

In this article, we proposed two approaches for transcript-length adjustment for analyses based on Poisson models: (i) At individual gene level, we adjusted each gene's test statistic using the square root of transcript length followed by testing for gene set using the Wilcoxon rank-sum test. (ii) At gene set level, we adjusted the null distribution for the Fisher's exact test by weighting the identification probability of each gene using the square root of its transcript length. We evaluated these two approaches using simulations and a real dataset, and showed that these methods can effectively reduce the transcript-length biases. The top-ranked GO terms obtained from the proposed adjustments show more overlaps with the microarray results.

AVAILABILITY

R scripts are at http://www.soph.uab.edu/Statgenetics/People/XCui/r-codes/.

摘要

动机

下一代测序技术正在被迅速应用于转录本的定量分析(RNA-seq)。然而,由于 RNA-seq 数据的独特性质,对于具有相同效应大小的更长转录本的差异表达,比具有更短转录本的差异表达更有可能被识别。这种偏差使下游基因集分析(GSA)变得复杂,因为之前为微阵列数据开发的 GSA 方法基于这样的假设,即具有相同效应大小的基因具有相同的被识别为显著差异表达的概率(功效)。由于转录本长度与基因表达无关,因此在 GSA 中进行这种长度依赖性的调整是必要的。

结果

在本文中,我们提出了两种基于泊松模型的转录本长度调整方法:(i)在单个基因水平上,我们使用转录本长度的平方根调整每个基因的检验统计量,然后使用 Wilcoxon 秩和检验对基因集进行检验。(ii)在基因集水平上,我们通过使用转录本长度的平方根来加权每个基因的识别概率,从而调整 Fisher 精确检验的零分布。我们使用模拟数据和真实数据集评估了这两种方法,结果表明这些方法可以有效地减少转录本长度的偏差。从提出的调整中获得的排名最高的 GO 术语与微阵列结果的重叠更多。

可用性

R 脚本可在 http://www.soph.uab.edu/Statgenetics/People/XCui/r-codes/ 获得。

相似文献

1
Length bias correction for RNA-seq data in gene set analyses.基因集分析中 RNA-seq 数据的长度偏差校正。
Bioinformatics. 2011 Mar 1;27(5):662-9. doi: 10.1093/bioinformatics/btr005. Epub 2011 Jan 19.
2
Comparative evaluation of gene set analysis approaches for RNA-Seq data.RNA测序数据基因集分析方法的比较评估
BMC Bioinformatics. 2014 Dec 5;15(1):397. doi: 10.1186/s12859-014-0397-8.
3
Detecting differentially expressed genes by smoothing effect of gene length on variance estimation.通过基因长度对方差估计的平滑效应来检测差异表达基因。
J Bioinform Comput Biol. 2015 Dec;13(6):1542004. doi: 10.1142/S0219720015420044. Epub 2015 Oct 11.
4
Robust adjustment of sequence tag abundance.序列标签丰度的稳健调整。
Bioinformatics. 2014 Mar 1;30(5):601-5. doi: 10.1093/bioinformatics/btt575. Epub 2013 Oct 9.
5
LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data.LFCseq:一种用于RNA测序数据差异表达分析的非参数方法。
BMC Genomics. 2014;15 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2164-15-S10-S7. Epub 2014 Dec 12.
6
Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.混合模型揭示了RNA测序数据中的多种位置偏差类型,并能准确估计转录本浓度。
PLoS Comput Biol. 2017 May 15;13(5):e1005515. doi: 10.1371/journal.pcbi.1005515. eCollection 2017 May.
7
A probabilistic approach for automated discovery of perturbed genes using expression data from microarray or RNA-Seq.一种使用来自微阵列或RNA测序的表达数据自动发现受干扰基因的概率方法。
Comput Biol Med. 2015 Dec 1;67:29-40. doi: 10.1016/j.compbiomed.2015.07.029. Epub 2015 Aug 14.
8
Multivariate analysis of variance test for gene set analysis.用于基因集分析的多变量方差分析测试。
Bioinformatics. 2009 Apr 1;25(7):897-903. doi: 10.1093/bioinformatics/btp098. Epub 2009 Mar 2.
9
Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.使用泊松混合效应模型来量化 RNA-Seq 中转录水平的基因表达。
Bioinformatics. 2012 Jan 1;28(1):63-8. doi: 10.1093/bioinformatics/btr616. Epub 2011 Nov 8.
10
Transcript length bias in RNA-seq data confounds systems biology.RNA测序数据中的转录本长度偏差会混淆系统生物学。
Biol Direct. 2009 Apr 16;4:14. doi: 10.1186/1745-6150-4-14.

引用本文的文献

1
The effect of aldafermin expressing-Escherichia coli Nissle 1917 along with dietary change on visceral adipose tissue in MASLD mouse model.表达醛铁蛋白的大肠杆菌Nissle 1917联合饮食改变对代谢相关脂肪性肝病小鼠模型内脏脂肪组织的影响。
Int J Obes (Lond). 2025 Apr 10. doi: 10.1038/s41366-025-01774-w.
2
Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single-cell RNA-sequencing datasets.使用单细胞 RNA 测序数据集对具有不同细胞大小的异质组织进行计算去卷积所面临的挑战和机遇。
Genome Biol. 2023 Dec 14;24(1):288. doi: 10.1186/s13059-023-03123-4.
3
Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets.利用单细胞RNA测序数据集对具有不同细胞大小的异质组织进行计算反卷积的挑战与机遇。
ArXiv. 2023 May 10:arXiv:2305.06501v1.
4
Aging is associated with a systemic length-associated transcriptome imbalance.衰老是与系统性长度相关的转录组失衡有关。
Nat Aging. 2022 Dec;2(12):1191-1206. doi: 10.1038/s43587-022-00317-6. Epub 2022 Dec 9.
5
Characterizing the tumor immune microenvironment of ependymomas using targeted gene expression profiles and RNA sequencing.利用靶向基因表达谱和 RNA 测序描绘室管膜瘤的肿瘤免疫微环境。
Cancer Immunol Immunother. 2023 Aug;72(8):2659-2670. doi: 10.1007/s00262-023-03450-2. Epub 2023 Apr 19.
6
Sensitive and accurate analysis of gene expression signatures enabled by oligonucleotide-labelled cDNA.寡核苷酸标记 cDNA 实现基因表达谱的灵敏和准确分析。
RNA Biol. 2022 Jan;19(1):774-780. doi: 10.1080/15476286.2022.2078093.
7
Gene Expression Profiling of Skeletal Muscles.骨骼肌基因表达谱分析。
Genes (Basel). 2021 Oct 28;12(11):1718. doi: 10.3390/genes12111718.
8
RNA-Seq Perspectives to Improve Clinical Diagnosis.用于改善临床诊断的RNA测序视角
Front Genet. 2019 Nov 12;10:1152. doi: 10.3389/fgene.2019.01152. eCollection 2019.
9
Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。
Genet Epidemiol. 2019 Oct;43(7):786-799. doi: 10.1002/gepi.22246. Epub 2019 Jul 22.
10
Family-Based Quantitative Trait Meta-Analysis Implicates Rare Noncoding Variants in DENND1A in Polycystic Ovary Syndrome.基于家系的数量性状荟萃分析表明多囊卵巢综合征中DENND1A基因存在罕见的非编码变异。
J Clin Endocrinol Metab. 2019 Sep 1;104(9):3835-3850. doi: 10.1210/jc.2018-02496.

本文引用的文献

1
Differential expression analysis for sequence count data.差异表达分析序列计数数据。
Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.
2
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.mRNA-Seq 实验中标准化和差异表达的统计方法评估。
BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.
3
Gene ontology analysis for RNA-seq: accounting for selection bias.RNA-seq 的基因本体分析:考虑选择偏差。
Genome Biol. 2010;11(2):R14. doi: 10.1186/gb-2010-11-2-r14. Epub 2010 Feb 4.
4
RNA-seq: from technology to biology.RNA-seq:从技术到生物学。
Cell Mol Life Sci. 2010 Feb;67(4):569-79. doi: 10.1007/s00018-009-0180-6. Epub 2009 Oct 27.
5
Transcript length bias in RNA-seq data confounds systems biology.RNA测序数据中的转录本长度偏差会混淆系统生物学。
Biol Direct. 2009 Apr 16;4:14. doi: 10.1186/1745-6150-4-14.
6
Statistical inferences for isoform expression in RNA-Seq.RNA测序中异构体表达的统计推断。
Bioinformatics. 2009 Apr 15;25(8):1026-32. doi: 10.1093/bioinformatics/btp113. Epub 2009 Feb 25.
7
Variable locus length in the human genome leads to ascertainment bias in functional inference for non-coding elements.人类基因组中可变的基因座长度会导致非编码元件功能推断中的确认偏倚。
Bioinformatics. 2009 Mar 1;25(5):578-84. doi: 10.1093/bioinformatics/btp043. Epub 2009 Jan 25.
8
RNA-Seq: a revolutionary tool for transcriptomics.RNA测序:转录组学的革命性工具。
Nat Rev Genet. 2009 Jan;10(1):57-63. doi: 10.1038/nrg2484.
9
RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.RNA测序:技术可重复性评估及与基因表达阵列的比较
Genome Res. 2008 Sep;18(9):1509-17. doi: 10.1101/gr.079558.108. Epub 2008 Jun 11.
10
Stem cell transcriptome profiling via massive-scale mRNA sequencing.通过大规模mRNA测序进行干细胞转录组分析。
Nat Methods. 2008 Jul;5(7):613-9. doi: 10.1038/nmeth.1223. Epub 2008 May 30.