• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SAGE 文库中聚合计数的偏差校正和贝叶斯分析。

Bias correction and Bayesian analysis of aggregate counts in SAGE libraries.

机构信息

Department of Statistics, Operations, and Management Science, The University of Tennessee, 331 Stokely Management Center, Knoxville, TN 37996, USA.

出版信息

BMC Bioinformatics. 2010 Feb 3;11:72. doi: 10.1186/1471-2105-11-72.

DOI:10.1186/1471-2105-11-72
PMID:20128916
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2829012/
Abstract

BACKGROUND

Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power.

RESULTS

Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context.

CONCLUSIONS

Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.

摘要

背景

基于标签的技术,如 SAGE,常用于对生物体转录组的 mRNA 池进行采样。在标签形成过程中,如果不完全消化,可能会从给定的 mRNA 转录本生成多个标签。形成标签的概率随其相对位置而变化。因此,观察到的标签计数代表实际转录本池的有偏差样本。在 SAGE 中,可以通过忽略除 3'端最接近的标签之外的所有标签来避免这种偏差,但会丢弃大量观察到的数据。考虑到这种偏差,可以使用更多可用数据,从而提高统计能力。

结果

提出了三个新的层次模型,它们直接嵌入了标签形成概率变化的模型,并开发了相应的贝叶斯推断算法。这些模型可应用于标签和聚合水平的文库。通过模拟实验和真实数据的分析,比较了各种方法的准确性。在测试差异表达的背景下,讨论了标签形成偏差的后果。给出了如何在这种情况下应用这些算法的说明。

结论

与 DPB 算法相比,比较了几种考虑标签形成效应的贝叶斯推断算法,清楚地证明了它们具有优越的性能。在使用特定非信息先验时进行推断的准确性取决于给定基因的表达水平。该方法的多元性质很容易允许对差异表达进行单变量和联合检验。计算表明,在测试差异表达时,由于标签形成概率在样本之间的变化,可能会导致假阳性和假阴性发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/5f0092de7513/1471-2105-11-72-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/4c9b8a71eb23/1471-2105-11-72-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/19a4c9523aa0/1471-2105-11-72-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/d4e349ff3d40/1471-2105-11-72-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/ad2a6edbbd6f/1471-2105-11-72-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/3f6ed9812efa/1471-2105-11-72-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/5f0092de7513/1471-2105-11-72-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/4c9b8a71eb23/1471-2105-11-72-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/19a4c9523aa0/1471-2105-11-72-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/d4e349ff3d40/1471-2105-11-72-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/ad2a6edbbd6f/1471-2105-11-72-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/3f6ed9812efa/1471-2105-11-72-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32bb/2829012/5f0092de7513/1471-2105-11-72-6.jpg

相似文献

1
Bias correction and Bayesian analysis of aggregate counts in SAGE libraries.SAGE 文库中聚合计数的偏差校正和贝叶斯分析。
BMC Bioinformatics. 2010 Feb 3;11:72. doi: 10.1186/1471-2105-11-72.
2
Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework.在贝叶斯框架内对SAGE标签形成及其对数据解释的影响进行建模。
BMC Bioinformatics. 2007 Oct 18;8:403. doi: 10.1186/1471-2105-8-403.
3
Correction of sequence-based artifacts in serial analysis of gene expression.基因表达序列分析中基于序列的伪影校正。
Bioinformatics. 2004 May 22;20(8):1254-63. doi: 10.1093/bioinformatics/bth077. Epub 2004 Feb 10.
4
Statistical modeling of sequencing errors in SAGE libraries.SAGE文库中测序错误的统计建模
Bioinformatics. 2004 Aug 4;20 Suppl 1:i31-9. doi: 10.1093/bioinformatics/bth924.
5
[Transcriptomes for serial analysis of gene expression].[用于基因表达序列分析的转录组]
J Soc Biol. 2002;196(4):303-7.
6
POWER_SAGE: comparing statistical tests for SAGE experiments.POWER_SAGE:比较SAGE实验的统计检验方法
Bioinformatics. 2000 Nov;16(11):953-9. doi: 10.1093/bioinformatics/16.11.953.
7
Identitag, a relational database for SAGE tag identification and interspecies comparison of SAGE libraries.Identitag,一个用于SAGE标签识别和SAGE文库种间比较的关系数据库。
BMC Bioinformatics. 2004 Oct 6;5:143. doi: 10.1186/1471-2105-5-143.
8
A comparative analysis of the information content in long and short SAGE libraries.长链和短链SAGE文库中信息含量的比较分析。
BMC Bioinformatics. 2006 Nov 16;7:504. doi: 10.1186/1471-2105-7-504.
9
Correction of technology-related artifacts in serial analysis of gene expression.基因表达序列分析中技术相关伪影的校正。
Methods Mol Biol. 2008;387:133-42. doi: 10.1007/978-1-59745-454-4_10.
10
Modeling Sage data with a truncated gamma-Poisson model.使用截断伽马-泊松模型对Sage数据进行建模。
BMC Bioinformatics. 2006 Mar 20;7:157. doi: 10.1186/1471-2105-7-157.

引用本文的文献

1
The analytical landscape of static and temporal dynamics in transcriptome data.转录组数据中静态和时间动态的分析格局。
Front Genet. 2014 Feb 20;5:35. doi: 10.3389/fgene.2014.00035. eCollection 2014.
2
Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing.高通量测序分析人类乳腺癌细胞和正常细胞中串联 3'UTR 的差异全基因组图谱。
Genome Res. 2011 May;21(5):741-7. doi: 10.1101/gr.115295.110. Epub 2011 Apr 7.

本文引用的文献

1
Statistical inferences for isoform expression in RNA-Seq.RNA测序中异构体表达的统计推断。
Bioinformatics. 2009 Apr 15;25(8):1026-32. doi: 10.1093/bioinformatics/btp113. Epub 2009 Feb 25.
2
Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms.基于深度测序的表达分析在稳健性、分辨率和实验室间可移植性方面相较于五个微阵列平台有了重大进展。
Nucleic Acids Res. 2008 Dec;36(21):e141. doi: 10.1093/nar/gkn705. Epub 2008 Oct 15.
3
Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework.
在贝叶斯框架内对SAGE标签形成及其对数据解释的影响进行建模。
BMC Bioinformatics. 2007 Oct 18;8:403. doi: 10.1186/1471-2105-8-403.
4
Accurate and unambiguous tag-to-gene mapping in serial analysis of gene expression.基因表达序列分析中准确且明确的标签到基因的映射。
BMC Bioinformatics. 2006 Nov 4;7:487. doi: 10.1186/1471-2105-7-487.
5
SuperSAGE array: the direct use of 26-base-pair transcript tags in oligonucleotide arrays.超级SAGE芯片:在寡核苷酸芯片中直接使用26个碱基对的转录本标签。
Nat Methods. 2006 Jun;3(6):469-74. doi: 10.1038/nmeth882.
6
Modeling Sage data with a truncated gamma-Poisson model.使用截断伽马-泊松模型对Sage数据进行建模。
BMC Bioinformatics. 2006 Mar 20;7:157. doi: 10.1186/1471-2105-7-157.
7
Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach.识别多个SAGE文库中的差异表达:一种过度分散的对数线性模型方法。
BMC Bioinformatics. 2005 Jun 29;6:165. doi: 10.1186/1471-2105-6-165.
8
Overdispersed logistic regression for SAGE: modelling multiple groups and covariates.用于SAGE的过度分散逻辑回归:对多个组和协变量进行建模
BMC Bioinformatics. 2004 Oct 6;5:144. doi: 10.1186/1471-2105-5-144.
9
Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE).考虑基因表达序列分析(SAGE)中类内生物学变异性的贝叶斯模型。
BMC Bioinformatics. 2004 Aug 31;5:119. doi: 10.1186/1471-2105-5-119.
10
Gene expression analysis of plant host-pathogen interactions by SuperSAGE.利用超级SAGE技术分析植物宿主-病原体相互作用中的基因表达
Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15718-23. doi: 10.1073/pnas.2536670100. Epub 2003 Dec 15.