González Francisco Javier, Vizcaíno Juan Antonio
EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Methods Mol Biol. 2011;722:103-20. doi: 10.1007/978-1-61779-040-9_7.
This chapter describes how a pipeline for the analysis of expressed sequence tag (EST) data can be -implemented, based on our previous experience generating ESTs from Trichoderma spp. We focus on key steps in the workflow, such as the processing of raw data from the sequencers, the clustering of ESTs, and the functional annotation of the sequences using BLAST, InterProScan, and BLAST2GO. Some of the steps require the use of intensive computing power. Since these resources are not available for small research groups or institutes without bioinformatics support, an alternative will be described: the use of distributed computing resources (local grids and Amazon EC2).
本章基于我们之前从木霉属物种中生成表达序列标签(EST)的经验,描述了如何实施一个用于分析EST数据的流程。我们重点关注工作流程中的关键步骤,例如来自测序仪的原始数据处理、EST聚类以及使用BLAST、InterProScan和BLAST2GO对序列进行功能注释。其中一些步骤需要使用强大的计算能力。由于小型研究小组或没有生物信息学支持的机构无法获得这些资源,因此将介绍一种替代方案:使用分布式计算资源(本地网格和亚马逊弹性计算云)。