Suppr超能文献

生命科学数据库平面文件的自动结构提取与XML转换。

Automated structure extraction and XML conversion of life science database flat files.

作者信息

Philippi Stephan, Köhler Jacob

机构信息

University of Koblenz, Koblenz, Germany.

出版信息

IEEE Trans Inf Technol Biomed. 2006 Oct;10(4):714-21. doi: 10.1109/titb.2006.875653.

Abstract

In the light of the increasing number of biological databases, their integration is a fundamental prerequisite for answering complex biological questions. Database integration, therefore, is an important area of research in bioinformatics. Since most of the publicly available life science databases are still exclusively exchanged by means of proprietary flat files, database integration requires parsers for very different flat file formats. Unfortunately, the development and maintenance of database specific flat file parsers is a nontrivial and time-consuming task, which takes considerable effort in large-scale integration scenarios. This paper introduces heuristically based concepts for automatic structure extraction from life science database flat files. On the basis of these concepts the FlatEx prototype is developed for the automatic conversion of flat files into XML representations.

摘要

鉴于生物数据库数量不断增加,其整合是回答复杂生物学问题的基本前提。因此,数据库整合是生物信息学中一个重要的研究领域。由于大多数公开可用的生命科学数据库仍仅通过专有平面文件进行交换,数据库整合需要针对非常不同的平面文件格式的解析器。不幸的是,开发和维护特定于数据库的平面文件解析器是一项艰巨且耗时的任务,在大规模整合场景中需要付出相当大的努力。本文介绍了基于启发式的从生命科学数据库平面文件中自动提取结构的概念。基于这些概念,开发了FlatEx原型,用于将平面文件自动转换为XML表示形式。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验