• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在LongSAGE分析中丢弃重复的双标签可能会引入重大误差。

Discarding duplicate ditags in LongSAGE analysis may introduce significant error.

作者信息

Emmersen Jeppe, Heidenblut Anna M, Høgh Annabeth Laursen, Hahn Stephan A, Welinder Karen G, Nielsen Kåre L

机构信息

Department of Biotechnology, Chemistry and Environmental Engineering, Aalborg University, Aalborg, Denmark.

出版信息

BMC Bioinformatics. 2007 Mar 14;8:92. doi: 10.1186/1471-2105-8-92.

DOI:10.1186/1471-2105-8-92
PMID:17359537
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1839111/
Abstract

BACKGROUND

During gene expression analysis by Serial Analysis of Gene Expression (SAGE), duplicate ditags are routinely removed from the data analysis, because they are suspected to stem from artifacts during SAGE library construction. As a consequence, naturally occurring duplicate ditags are also removed from the analysis leading to an error of measurement.

RESULTS

An algorithm was developed to analyze the differential occurrence of SAGE tags in different ditag combinations. Analysis of a pancreatic acinar cell LongSAGE library showed no sign of a general amplification bias that justified the removal of all duplicate ditags. Extending the analysis to 10 additional LongSAGE libraries showed no justification for removal of all duplicate ditags either. On the contrary, while the error introduced in original SAGE by removal of naturally occurring duplicate ditags is insignificant, it leads to an error of up to 3 fold in LongSAGE. However, the algorithm developed for the analysis of duplicate ditags was able to identify individual artifact ditags that originated from rare nucleotide variations of tags and vector contamination.

CONCLUSION

The removal of all duplicate ditags was unfounded for the datasets analyzed and led to large errors. This may also be the case for other LongSAGE datasets already present in databases. Analysis of the ditag population, however, can identify artifact tags that should be removed from analysis or have their tag count adjusted.

摘要

背景

在通过基因表达序列分析(SAGE)进行基因表达分析时,重复的双标签在数据分析过程中通常会被去除,因为怀疑它们源于SAGE文库构建过程中的人为因素。因此,天然存在的重复双标签也会从分析中被去除,从而导致测量误差。

结果

开发了一种算法来分析不同双标签组合中SAGE标签的差异出现情况。对胰腺腺泡细胞LongSAGE文库的分析表明,没有迹象表明存在普遍的扩增偏差,从而证明去除所有重复双标签是合理的。将分析扩展到另外10个LongSAGE文库也表明没有理由去除所有重复双标签。相反,虽然在原始SAGE中去除天然存在的重复双标签所引入的误差微不足道,但在LongSAGE中却会导致高达3倍的误差。然而,为分析重复双标签而开发的算法能够识别出源于标签的罕见核苷酸变异和载体污染的个别人为双标签。

结论

对于所分析的数据集,去除所有重复双标签是没有根据的,并且会导致较大误差。对于数据库中已有的其他LongSAGE数据集,情况可能也是如此。然而,对双标签群体的分析可以识别出应从分析中去除或调整其标签计数的人为标签。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf57/1839111/ae8a729d6060/1471-2105-8-92-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf57/1839111/d6fabc17796d/1471-2105-8-92-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf57/1839111/9acc40c8dd23/1471-2105-8-92-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf57/1839111/ae8a729d6060/1471-2105-8-92-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf57/1839111/d6fabc17796d/1471-2105-8-92-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf57/1839111/9acc40c8dd23/1471-2105-8-92-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf57/1839111/ae8a729d6060/1471-2105-8-92-3.jpg

相似文献

1
Discarding duplicate ditags in LongSAGE analysis may introduce significant error.在LongSAGE分析中丢弃重复的双标签可能会引入重大误差。
BMC Bioinformatics. 2007 Mar 14;8:92. doi: 10.1186/1471-2105-8-92.
2
Duplicate ditag analysis in LongSAGE.
Methods Mol Biol. 2008;387:143-50. doi: 10.1007/978-1-59745-454-4_11.
3
Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines.从显微切割组织、流式分选细胞和细胞系大规模生产SAGE文库。
Genome Res. 2007 Jan;17(1):108-16. doi: 10.1101/gr.5488207. Epub 2006 Nov 29.
4
Statistical modeling of sequencing errors in SAGE libraries.SAGE文库中测序错误的统计建模
Bioinformatics. 2004 Aug 4;20 Suppl 1:i31-9. doi: 10.1093/bioinformatics/bth924.
5
A comparative analysis of the information content in long and short SAGE libraries.长链和短链SAGE文库中信息含量的比较分析。
BMC Bioinformatics. 2006 Nov 16;7:504. doi: 10.1186/1471-2105-7-504.
6
Incidence of "quasi-ditags" in catalogs generated by Serial Analysis of Gene Expression (SAGE).基因表达序列分析(SAGE)生成的目录中“准双标签”的发生率。
BMC Bioinformatics. 2004 Oct 18;5:152. doi: 10.1186/1471-2105-5-152.
7
"In-gel" purified ditags direct synthesis of highly efficient SAGE Libraries.凝胶内纯化双标签直接合成高效SAGE文库。
BMC Genomics. 2002 Aug 1;3(1):20. doi: 10.1186/1471-2164-3-20.
8
Minimizing loss of sequence information in SAGE ditags by modulating the temperature dependent 3' --> 5' exonuclease activity of DNA polymerases on 3'-terminal isoheptyl amino groups.通过调节DNA聚合酶对3'-末端异庚基氨基基团的温度依赖性3'→5'核酸外切酶活性,使SAGE标签中序列信息的损失最小化。
Biotechnol Bioeng. 2006 May 5;94(1):54-65. doi: 10.1002/bit.20805.
9
Amplification of high-quantity serial analysis of gene expression ditags and improvement of concatemer cloning efficiency.基因表达双标签的高产量串联分析的扩增及多联体克隆效率的提高。
Biotechniques. 2003 Jul;35(1):66-7, 70-2. doi: 10.2144/03351st01.
10
Correction of sequence-based artifacts in serial analysis of gene expression.基因表达序列分析中基于序列的伪影校正。
Bioinformatics. 2004 May 22;20(8):1254-63. doi: 10.1093/bioinformatics/bth077. Epub 2004 Feb 10.

引用本文的文献

1
Palindromic sequence impedes sequencing-by-ligation mechanism.回文序列阻碍连接测序机制。
BMC Syst Biol. 2012;6 Suppl 2(Suppl 2):S10. doi: 10.1186/1752-0509-6-S2-S10. Epub 2012 Dec 12.
2
Identification of novel androgen-responsive genes by sequencing of LongSAGE libraries.通过对LongSAGE文库进行测序鉴定新型雄激素反应基因。
BMC Genomics. 2009 Oct 15;10:476. doi: 10.1186/1471-2164-10-476.
3
Gene expression profiling via LongSAGE in a non-model plant species: a case study in seeds of Brassica napus.通过LongSAGE技术对非模式植物物种进行基因表达谱分析:以甘蓝型油菜种子为例的研究

本文引用的文献

1
Global transcript profiling of potato tuber using LongSAGE.利用LongSAGE技术对马铃薯块茎进行全转录本分析。
Plant Biotechnol J. 2005 Mar;3(2):175-85. doi: 10.1111/j.1467-7652.2005.00115.x.
2
DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples.深度SAGE——具有高灵敏度、简单实验方案和样本多重分析功能的数字转录组学技术。
Nucleic Acids Res. 2006;34(19):e133. doi: 10.1093/nar/gkl714. Epub 2006 Oct 5.
3
Evaluation of the similarity of gene expression data estimated with SAGE and Affymetrix GeneChips.
BMC Genomics. 2009 Jul 3;10:295. doi: 10.1186/1471-2164-10-295.
4
A human glomerular SAGE transcriptome database.一个人类肾小球SAGE转录组数据库。
BMC Nephrol. 2009 Jun 5;10:13. doi: 10.1186/1471-2369-10-13.
5
Functional analysis of MmeI from methanol utilizer Methylophilus methylotrophus, a subtype IIC restriction-modification enzyme related to type I enzymes.来自甲醇利用菌嗜甲基甲基ophilus的MmeI的功能分析,MmeI是一种与I型酶相关的IIC型限制修饰酶亚型。
Appl Environ Microbiol. 2009 Jan;75(1):212-23. doi: 10.1128/AEM.01322-08. Epub 2008 Nov 7.
用SAGE和Affymetrix基因芯片评估基因表达数据的相似性。
BMC Genomics. 2005 Jun 14;6:91. doi: 10.1186/1471-2164-6-91.
4
Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR -- how well do they correlate?通过寡核苷酸微阵列分析和定量实时逆转录PCR评估的基因表达水平——它们的相关性如何?
BMC Genomics. 2005 Apr 27;6:59. doi: 10.1186/1471-2164-6-59.
5
Reproducibility, bioinformatic analysis and power of the SAGE method to evaluate changes in transcriptome.SAGE方法用于评估转录组变化的可重复性、生物信息学分析及效能
Nucleic Acids Res. 2005 Feb 16;33(3):e26. doi: 10.1093/nar/gni025.
6
Incidence of "quasi-ditags" in catalogs generated by Serial Analysis of Gene Expression (SAGE).基因表达序列分析(SAGE)生成的目录中“准双标签”的发生率。
BMC Bioinformatics. 2004 Oct 18;5:152. doi: 10.1186/1471-2105-5-152.
7
aRNA-longSAGE: a new approach to generate SAGE libraries from microdissected cells.aRNA-长链SAGE:一种从显微切割细胞中生成SAGE文库的新方法。
Nucleic Acids Res. 2004 Sep 15;32(16):e131. doi: 10.1093/nar/gnh130.
8
Correction of sequence-based artifacts in serial analysis of gene expression.基因表达序列分析中基于序列的伪影校正。
Bioinformatics. 2004 May 22;20(8):1254-63. doi: 10.1093/bioinformatics/bth077. Epub 2004 Feb 10.
9
A quantitative and validated SAGE transcriptome reference for adult mouse heart.一份经过验证的成年小鼠心脏定量SAGE转录组参考资料。
Genomics. 2002 Aug;80(2):213-22. doi: 10.1006/geno.2002.6821.
10
Using the transcriptome to annotate the genome.利用转录组注释基因组。
Nat Biotechnol. 2002 May;20(5):508-12. doi: 10.1038/nbt0502-508.