基因表达序列分析（SAGE）生成的目录中“准双标签”的发生率。

Incidence of "quasi-ditags" in catalogs generated by Serial Analysis of Gene Expression (SAGE).

作者信息

Anisimov Sergey V, Sharov Alexei A

机构信息

Section for Neuronal Survival, Wallenberg Neuroscience Center, Lund University, 221 84 Lund, Sweden.

出版信息

BMC Bioinformatics. 2004 Oct 18;5:152. doi: 10.1186/1471-2105-5-152.

DOI:10.1186/1471-2105-5-152

PMID:15491492

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC526221/

Abstract

BACKGROUND

Serial Analysis of Gene Expression (SAGE) is a functional genomic technique that quantitatively analyzes the cellular transcriptome. The analysis of SAGE libraries relies on the identification of ditags from sequencing files; however, the software used to examine SAGE libraries cannot distinguish between authentic versus false ditags ("quasi-ditags").

RESULTS

We provide examples of quasi-ditags that originate from cloning and sequencing artifacts (i.e. genomic contamination or random combinations of nucleotides) that are included in SAGE libraries. We have employed a mathematical model to predict the frequency of quasi-ditags in random nucleotide sequences, and our data show that clones containing less than or equal to 2 ditags (which include chromosomal cloning artifacts) should be excluded from the analysis of SAGE catalogs.

CONCLUSIONS

Cloning and sequencing artifacts contaminating SAGE libraries could be eliminated using simple pre-screening procedure to increase the reliability of the data.

摘要