Knox Craig, Shrivastava Savita, Stothard Paul, Eisner Roman, Wishart David S
Department of Computing Science, University of Alberta, Edmonton, AB T6G-2E8 Canada.
Pac Symp Biocomput. 2007:145-56.
One of the growing challenges in life science research lies in finding useful, descriptive or quantitative data about newly reported biomolecules (genes, proteins, metabolites and drugs). An even greater challenge is finding information that connects these genes, proteins, drugs or metabolites to each other. Much of this information is scattered through hundreds of different databases, abstracts or books and almost none of it is particularly well integrated. While some efforts are being undertaken at the NCBI and EBI to integrate many different databases together, this still falls short of the goal of having some kind of human-readable synopsis that summarizes the state of knowledge about a given biomolecule - especially small molecules. To address this shortfall, we have developed BioSpider. BioSpider is essentially an automated report generator designed specifically to tabulate and summarize data on biomolecules - both large and small. Specifically, BioSpider allows users to type in almost any kind of biological or chemical identifier (protein/gene name, sequence, accession number, chemical name, brand name, SMILES string, InCHI string, CAS number, etc.) and it returns an in-depth synoptic report (approximately 3-30 pages in length) about that biomolecule and any other biomolecule it may target. This summary includes physico-chemical parameters, images, models, data files, descriptions and predictions concerning the query molecule. BioSpider uses a web-crawler to scan through dozens of public databases and employs a variety of specially developed text mining tools and locally developed prediction tools to find, extract and assemble data for its reports. Because of its breadth, depth and comprehensiveness, we believe BioSpider will prove to be a particularly valuable tool for researchers in metabolomics. BioSpider is available at: www.biospider.ca
生命科学研究中日益严峻的挑战之一是,要找到有关新报道的生物分子(基因、蛋白质、代谢物和药物)的有用的、描述性或定量的数据。一个更大的挑战是找到将这些基因、蛋白质、药物或代谢物相互联系起来的信息。这些信息大多分散在数百个不同的数据库、摘要或书籍中,几乎没有特别好地整合在一起。虽然美国国立医学图书馆(NCBI)和欧洲生物信息研究所(EBI)正在努力将许多不同的数据库整合在一起,但这仍未达到拥有某种人类可读的概要的目标,该概要可总结关于给定生物分子——尤其是小分子的知识状态。为了弥补这一不足,我们开发了BioSpider。BioSpider本质上是一个自动报告生成器,专门设计用于将生物分子(无论大小)的数据制成表格并进行汇总。具体而言,BioSpider允许用户输入几乎任何类型的生物或化学标识符(蛋白质/基因名称、序列、登录号、化学名称、品牌名称、SMILES字符串、InCHI字符串、CAS编号等),并返回一份关于该生物分子及其可能靶向的任何其他生物分子的深入概要报告(长度约为3至30页)。该摘要包括有关查询分子的物理化学参数、图像、模型、数据文件、描述和预测。BioSpider使用网络爬虫扫描数十个公共数据库,并采用各种专门开发的文本挖掘工具和本地开发的预测工具来查找、提取和汇总其报告所需的数据。由于其广度、深度和全面性,我们相信BioSpider将被证明是代谢组学研究人员特别有价值的工具。可通过以下网址访问BioSpider:www.biospider.ca