Suppr超能文献

用于基因组和遗传图谱数据交换的 XML 传输模式:作为 Taverna 工作流中的 Web 服务实现。

An XML transfer schema for exchange of genomic and genetic mapping data: implementation as a web service in a Taverna workflow.

机构信息

Division of Genetics and Genomics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, UK.

出版信息

BMC Bioinformatics. 2009 Aug 14;10:252. doi: 10.1186/1471-2105-10-252.

Abstract

BACKGROUND

Genomic analysis, particularly for less well-characterized organisms, is greatly assisted by performing comparative analyses between different types of genome maps and across species boundaries. Various providers publish a plethora of on-line resources collating genome mapping data from a multitude of species. Datasources range in scale and scope from small bespoke resources for particular organisms, through larger web-resources containing data from multiple species, to large-scale bioinformatics resources providing access to data derived from genome projects for model and non-model organisms. The heterogeneity of information held in these resources reflects both the technologies used to generate the data and the target users of each resource. Currently there is no common information exchange standard or protocol to enable access and integration of these disparate resources. Consequently data integration and comparison must be performed in an ad hoc manner.

RESULTS

We have developed a simple generic XML schema (GenomicMappingData.xsd - GMD) to allow export and exchange of mapping data in a common lightweight XML document format. This schema represents the various types of data objects commonly described across mapping datasources and provides a mechanism for recording relationships between data objects. The schema is sufficiently generic to allow representation of any map type (for example genetic linkage maps, radiation hybrid maps, sequence maps and physical maps). It also provides mechanisms for recording data provenance and for cross referencing external datasources (including for example ENSEMBL, PubMed and Genbank.). The schema is extensible via the inclusion of additional datatypes, which can be achieved by importing further schemas, e.g. a schema defining relationship types. We have built demonstration web services that export data from our ArkDB database according to the GMD schema, facilitating the integration of data retrieval into Taverna workflows.

CONCLUSION

The data exchange standard we present here provides a useful generic format for transfer and integration of genomic and genetic mapping data. The extensibility of our schema allows for inclusion of additional data and provides a mechanism for typing mapping objects via third party standards. Web services retrieving GMD-compliant mapping data demonstrate that use of this exchange standard provides a practical mechanism for achieving data integration, by facilitating syntactically and semantically-controlled access to the data.

摘要

背景

基因组分析,特别是对于研究较少的生物,通过在不同类型的基因组图谱之间以及跨越物种边界进行比较分析,会得到极大的帮助。各种供应商都会发布大量的在线资源,这些资源整合了来自多种物种的基因组图谱数据。数据源的规模和范围从小型定制资源到特定生物体,再到大型生物信息学资源,为模型和非模型生物体的基因组项目提供数据访问。这些资源中所包含信息的异质性既反映了生成数据所使用的技术,也反映了每个资源的目标用户。目前,还没有通用的信息交换标准或协议来实现对这些异类资源的访问和集成。因此,数据集成和比较必须以特定方式进行。

结果

我们开发了一个简单的通用 XML 模式(GenomicMappingData.xsd-GMD),以允许以通用的轻量级 XML 文档格式导出和交换映射数据。该模式表示映射数据源中常见的数据对象类型,并提供了一种记录数据对象之间关系的机制。该模式具有足够的通用性,可以表示任何类型的图谱(例如遗传连锁图谱、辐射杂交图谱、序列图谱和物理图谱)。它还提供了记录数据来源和交叉引用外部数据源的机制(例如 ENSEMBL、PubMed 和 Genbank 等)。通过包含其他数据类型(例如,定义关系类型的模式),可以通过导入其他模式来扩展该模式。我们构建了演示 Web 服务,这些服务根据 GMD 模式从 ArkDB 数据库中导出数据,从而促进了将数据检索集成到 Taverna 工作流中。

结论

我们在这里提出的数据交换标准为转移和集成基因组和遗传图谱数据提供了一种有用的通用格式。我们的模式的可扩展性允许包含其他数据,并提供了一种通过第三方标准对映射对象进行分类的机制。符合 GMD 的映射数据的 Web 服务检索演示了,通过促进对数据的语法和语义控制访问,使用此交换标准为实现数据集成提供了一种实用机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4be/2743669/d3acb3ce3fe3/1471-2105-10-252-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验