Lazzari Barbara, Caprera Andrea, Vecchietti Alberto, Merelli Ivan, Barale Francesca, Milanesi Luciano, Stella Alessandra, Pozzi Carlo
Parco Tecnologico Padano, Via Einstein - Località Cascina Codazza, Lodi, 26900, Italy.
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S9. doi: 10.1186/1471-2105-9-S2-S9.
The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features.
A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features.
The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention.
ESTree数据库是一个包含桃和扁桃EST序列的集合,其当前版本包含来自3个扁桃文库和19个桃文库的75404条序列。涵盖了9个桃基因型和4个桃组织,来自4个果实发育阶段。这项工作的目的是通过添加新序列和分析程序来完善现有的ESTree数据库。特别关注了网络界面的实现,该界面允许查询数据库的每个功能。
一个Perl模块化管道是ESTree数据库项目中序列分析的核心。在管道步骤中获得的输出会自动排列到MySQL数据库的字段中。除了标准的聚类和注释分析外,ESTree数据库的第六版还包括用于串联重复识别的新工具、针对蔷薇科基因组序列的注释以及在桃微阵列研究中使用的寡聚体序列在数据库中的定位。此外,通过与PROSITE进行比较,识别出了已知的蛋白质模式和基序。基于从针对UniProtKB数据库的序列注释中检索到的数据,编写了一个脚本,用于跟踪同源匹配在GO树中的位置,并建立GO功能类别中本体分布的统计信息。EST图谱数据也被整合到数据库中。基于PHP的网络界面得到了升级和扩展。作者的目的是能够根据从ESTree数据库中可用数据分析中可以研究的所有生物学方面来查询数据库。这是通过允许对代表不同生物学情况或特征的序列逻辑子集进行多次搜索来实现的。
ESTree数据库的第六版提供了对桃基因表达的广泛概述。数据库中包含的序列分析结果与外部相关资源广泛链接,代表了大量可以通过网络界面提供的工具进行查询的信息。ESTree分析管道和网络界面的灵活性和模块化使作者能够在有限的人工干预下为不同的数据集建立类似的结构。