Pinheiro Daniel G, Galante Pedro A F, de Souza Sandro J, Zago Marco A, Silva Wilson A
Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP, Brazil.
BMC Bioinformatics. 2009 Jun 6;10:170. doi: 10.1186/1471-2105-10-170.
High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis.
This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system.
These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/. S3T source code and datasets can also be downloaded from the aforementioned website.
用于基因表达谱分析的高通量分子方法,如基因表达序列分析(SAGE)、大规模平行签名测序(MPSS)或合成测序(SBS),是强大的技术,通过对称为序列标签的转录本短片段进行测序,提供不同细胞类型的全局转录谱。这些技术增进了我们对这些表达谱与细胞表型之间关系的理解。尽管如此,仍需要更可靠的数据集。在这项工作中,我们展示了一个名为S3T:序列标签评分系统的基于网络的工具,用于根据测序标签的可靠性对其进行索引。这是通过基于一组定义规则的一系列评估来实现的。S3T允许识别/选择被认为对进一步基因表达分析更可靠的标签。
该方法应用于一个公共SAGE数据集。为了比较过滤前后的数据,使用这两个数据集对来自相同类型组织、处于不同生物学条件下的样本进行了层次聚类分析。我们的结果提供了证据,表明使用S3T评分系统后有可能找到更一致的聚类。
这些结果证实了所提出的应用能够生成更可靠的数据。这对确定全局基因表达谱有重大贡献。使用S3T进行文库分析可在http://gdm.fmrp.usp.br/s3t/免费获取。S3T的源代码和数据集也可从上述网站下载。