Keime Céline, Damiola Francesca, Mouchiroud Dominique, Duret Laurent, Gandrillon Olivier
Equipe Signalisation et identités cellulaires, Centre de Génétique Moléculaire et Cellulaire CNRS UMR 5534, Université Claude Bernard Lyon 1, bâtiment Gregor Mendel, 16 rue Raphaël Dubois 69622 Villeurbanne cedex France.
BMC Bioinformatics. 2004 Oct 6;5:143. doi: 10.1186/1471-2105-5-143.
Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that has the potential to generate the full list of mRNAs present within a cell population at a given time and their frequency. An essential step in SAGE library analysis is the unambiguous assignment of each 14 bp tag to the transcript from which it was derived. This process, called tag-to-gene mapping, represents a step that has to be improved in the analysis of SAGE libraries. Indeed, the existing web sites providing correspondence between tags and transcripts do not concern all species for which numerous EST and cDNA have already been sequenced.
This is the reason why we designed and implemented a freely available tool called Identitag for tag identification that can be used in any species for which transcript sequences are available. Identitag is based on a relational database structure in order to allow rapid and easy storage and updating of data and, most importantly, in order to be able to precisely define identification parameters. This structure can be seen like three interconnected modules : the first one stores virtual tags extracted from a given list of transcript sequences, the second stores experimental tags observed in SAGE experiments, and the third allows the annotation of the transcript sequences used for virtual tag extraction. It therefore connects an observed tag to a virtual tag and to the sequence it comes from, and then to its functional annotation when available. Databases made from different species can be connected according to orthology relationship thus allowing the comparison of SAGE libraries between species. We successfully used Identitag to identify tags from our chicken SAGE libraries and for chicken to human SAGE tags interspecies comparison. Identitag sources are freely available on http://pbil.univ-lyon1.fr/software/identitag/ web site.
Identitag is a flexible and powerful tool for tag identification in any single species and for interspecies comparison of SAGE libraries. It opens the way to comparative transcriptomic analysis, an emerging branch of biology.
基因表达序列分析(SAGE)是一种大规模基因表达分析方法,它有潜力生成特定时间内细胞群体中存在的所有mRNA及其频率的完整列表。SAGE文库分析中的一个关键步骤是将每个14bp标签明确地分配到其来源的转录本。这个过程称为标签到基因的映射,是SAGE文库分析中有待改进的一个步骤。实际上,现有的提供标签与转录本对应关系的网站并不涵盖所有已有大量EST和cDNA测序的物种。
这就是我们设计并实现了一个名为Identitag的免费工具用于标签识别的原因,该工具可用于任何有转录本序列的物种。Identitag基于关系数据库结构,以便能够快速、轻松地存储和更新数据,最重要的是,能够精确地定义识别参数。这种结构可以看作是三个相互连接的模块:第一个模块存储从给定转录本序列列表中提取的虚拟标签,第二个模块存储在SAGE实验中观察到的实验标签,第三个模块允许对用于提取虚拟标签的转录本序列进行注释。因此,它将一个观察到的标签与一个虚拟标签及其来源序列连接起来,然后在有可用功能注释时将其与功能注释连接起来。根据直系同源关系可以连接来自不同物种的数据库,从而允许比较不同物种之间的SAGE文库。我们成功地使用Identitag从我们的鸡SAGE文库中识别标签,并用于鸡与人的SAGE标签的种间比较。Identitag的源代码可在http://pbil.univ-lyon1.fr/software/identitag/网站上免费获取。
Identitag是一种灵活且强大的工具,可用于任何单个物种的标签识别以及SAGE文库的种间比较。它为比较转录组学分析这一生物学新兴分支开辟了道路。