Vailaya Aditya, Bluvas Peter, Kincaid Robert, Kuchinsky Allan, Creech Michael, Adler Annette
Agilent Laboratories 3500 Deer Creek Road, MS 26U-16, Palo Alto, CA 94304, USA.
Bioinformatics. 2005 Feb 15;21(4):430-8. doi: 10.1093/bioinformatics/bti187. Epub 2004 Dec 17.
Technological advances in biomedical research are generating a plethora of heterogeneous data at a high rate. There is a critical need for extraction, integration and management tools for information discovery and synthesis from these heterogeneous data.
In this paper, we present a general architecture, called ALFA, for information extraction and representation from diverse biological data. The ALFA architecture consists of: (i) a networked, hierarchical, hyper-graph object model for representing information from heterogeneous data sources in a standardized, structured format; and (ii) a suite of integrated, interactive software tools for information extraction and representation from diverse biological data sources. As part of our research efforts to explore this space, we have currently prototyped the ALFA object model and a set of interactive software tools for searching, filtering, and extracting information from scientific text. In particular, we describe BioFerret, a meta-search tool for searching and filtering relevant information from the web, and ALFA Text Viewer, an interactive tool for user-guided extraction, disambiguation, and representation of information from scientific text. We further demonstrate the potential of our tools in integrating the extracted information with experimental data and diagrammatic biological models via the common underlying ALFA representation.
生物医学研究中的技术进步正以高速产生大量异构数据。迫切需要用于从这些异构数据中进行信息发现和综合的提取、整合及管理工具。
在本文中,我们提出了一种名为ALFA的通用架构,用于从多样的生物数据中进行信息提取和表示。ALFA架构包括:(i) 一个网络化、分层的超图对象模型,用于以标准化、结构化格式表示来自异构数据源的信息;以及 (ii) 一套集成的交互式软件工具,用于从多样的生物数据源中进行信息提取和表示。作为我们探索该领域研究工作的一部分,我们目前已为ALFA对象模型以及一组用于从科学文本中搜索、筛选和提取信息的交互式软件工具制作了原型。特别是,我们描述了BioFerret,一种用于从网络搜索和筛选相关信息的元搜索工具,以及ALFA文本查看器,一种用于用户引导从科学文本中提取、消除歧义并表示信息的交互式工具。我们还通过共同的底层ALFA表示展示了我们的工具在将提取的信息与实验数据和图解生物模型进行整合方面的潜力。