Akmaev Viatcheslav R
Bioinformatics, Genzyme Corporation, Framingham, MA, USA.
Methods Mol Biol. 2008;387:133-42. doi: 10.1007/978-1-59745-454-4_10.
Serial analysis of gene expression (SAGE) is a powerful technique for measuring global gene expression through sampling of transcript tags. SAGE tag collections or libraries serve as a rich data source for differential gene expression analysis, transcriptome mapping, and gene discovery. Transcriptome mapping and gene discovery are facilitated by extensions of SAGE, e.g., Long SAGE, where the transcript tags are elongated by utilization of a different tagging enzyme. SAGE, as a sequencing-based technique, is prone to errors resulting in artifact SAGE tag sequences and erroneous tag numbers. A methodology to pinpoint and correct tag artifacts is necessary to fully exploit the value of large SAGE libraries. SAGEScreen is a tag sequence correction algorithm. The algorithm is a multistep procedure that addresses error rates and performs ditag and tag processing. The error rate estimates are based on a stochastic model of PCR and sequencing related mutations. The ditag processing step is essential for calculation of unbiased tag numbers, and the tag processing step allows for filtration of tag sequence artifacts and adjustment of tag numbers.
基因表达序列分析(SAGE)是一种通过转录本标签采样来测量整体基因表达的强大技术。SAGE标签集合或文库是差异基因表达分析、转录组图谱绘制和基因发现的丰富数据源。SAGE的扩展技术,如Long SAGE,通过使用不同的标签酶延长转录本标签,促进了转录组图谱绘制和基因发现。作为一种基于测序的技术,SAGE容易出错,导致人工SAGE标签序列和错误的标签数量。为了充分利用大型SAGE文库的价值,需要一种精确识别和纠正标签人工制品的方法。SAGEScreen是一种标签序列校正算法。该算法是一个多步骤程序,可处理错误率并进行双标签和标签处理。错误率估计基于PCR和测序相关突变的随机模型。双标签处理步骤对于计算无偏差的标签数量至关重要,而标签处理步骤允许过滤标签序列人工制品并调整标签数量。