Bianchetti Laurent, Wu Yan, Guerin Eric, Plewniak Frédéric, Poch Olivier
Plate-forme Bioinformatique de Strasbourg, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS/INSERM/ULP) BP 163, 67404 Illkirch Cedex, France.
Nucleic Acids Res. 2007;35(18):e122. doi: 10.1093/nar/gkm648. Epub 2007 Sep 20.
SAGE (Serial Analysis of Gene Expression) experiments generate short nucleotide sequences called 'tags' which are assumed to map unambiguously to their original transcripts (1 tag to 1 transcript mapping). Nevertheless, many tags are generated that do not map to any transcript or map to multiple transcripts. Current bioinformatics resources, such as SAGEmap and TAGmapper, have focused on reducing the number of unmapped tags. Here, we describe SAGETTARIUS, a new high-throughput program that performs successive precise Nla3 and Sau3A tag to transcript mapping, based on specifically designed Virtual Tag (VT) libraries. First, SAGETTARIUS decreases the number of tags mapped to multiple transcripts. Among the various mapping resources compared, SAGETTARIUS performed the best in this respect by decreasing up to 11% the number of multiply mapped tags. Second, SAGETTARIUS allows the establishment of a guideline for SAGE experiment sequencing efforts through efficient mapping of the CRT (Cytoplasmic Ribosomal protein Transcripts)-specific tags. Using all publicly available human and mouse Nla3 SAGE experiments, we show that sequencing 100,000 tags is sufficient to map almost all CRT-specific tags and that four sequencing stages can be identified when carrying out a human or mouse SAGE project. SAGETTARIUS is web interfaced and freely accessible to academic users.
基因表达序列分析(SAGE)实验产生被称为“标签”的短核苷酸序列,假定这些序列能明确地映射到其原始转录本(1个标签对应1个转录本映射)。然而,会产生许多无法映射到任何转录本或映射到多个转录本的标签。当前的生物信息学资源,如SAGEmap和TAGmapper,专注于减少未映射标签的数量。在此,我们描述了SAGETTARIUS,这是一个新的高通量程序,它基于专门设计的虚拟标签(VT)文库,对Nla3和Sau3A标签进行连续精确的转录本映射。首先,SAGETTARIUS减少了映射到多个转录本的标签数量。在比较的各种映射资源中,SAGETTARIUS在这方面表现最佳,将多重映射标签的数量减少了多达11%。其次,SAGETTARIUS通过对细胞质核糖体蛋白转录本(CRT)特异性标签的高效映射,为SAGE实验测序工作建立了一个指导方针。利用所有公开可用的人类和小鼠Nla3 SAGE实验,我们表明对100,000个标签进行测序足以映射几乎所有CRT特异性标签,并且在开展人类或小鼠SAGE项目时可以确定四个测序阶段。SAGETTARIUS有网络界面,学术用户可免费使用。