Kasprzyk Arek, Keefe Damian, Smedley Damian, London Darin, Spooner William, Melsopp Craig, Hammond Martin, Rocca-Serra Philippe, Cox Tony, Birney Ewan
European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SH, UK.
Genome Res. 2004 Jan;14(1):160-9. doi: 10.1101/gr.1645104.
The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to 'non-Ensembl' data sets.
EnsMart系统(www.ensembl.org/EnsMart)提供了一种通用的数据仓库解决方案,用于快速灵活地查询大型生物数据集,并与第三方数据和工具进行集成。该系统由一个经过查询优化的数据库和交互式、用户友好的界面组成。EnsMart已应用于Ensembl,扩展了其基因组浏览器功能,便于快速检索定制数据集。支持对多种物种的各种类型注释进行各种各样的复杂查询。这些查询可应用于许多研究问题,从用于候选基因筛选的单核苷酸多态性(SNP)选择,到跨物种进化比较,再到微阵列注释。用户可以根据许多标准对生物数据进行分组和细化,包括跨物种分析、疾病关联、序列变异和表达模式。可以动态生成表格列表数据和生物序列输出,格式包括HTML、文本、Microsoft Excel和压缩格式。可以检索多种序列类型,如cDNA、肽、编码区、非翻译区(UTR)和外显子,以及额外的上游和下游区域。可以通过公共网站或Java应用程序套件访问EnsMart数据库。这两种实现方式以及数据库均可免费用于本地安装,并且可以扩展或适用于“非Ensembl”数据集。