IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1504-1515. doi: 10.1109/TCBB.2019.2951137. Epub 2019 Nov 4.
The development of the next-generation sequencing (NGS) technologies has led to massive amounts of VCF (Variant Call Format) files, which have been the standard formats developed with 1000 Genomes Project. At the same time, with the widespread use of biomedical ontologies in the biomedical community, more and more applications have accepted the Web Ontology Language (OWL) as the dominant data format for the specifications of biomedical ontology descriptions, leading to the rapid growth of OWL-based biomedical ontology scale. In this paper, we seek to explore an effective method for the management of VCF-based genetic variants and OWL-based biological ontologies using the MongoDB database. Considering many current applications (such as the short genetic variations database dbSNP, etc.) are transitioning to the new design by using JSON (JavaScript Object Notation) to support future massive data expansion and interchanges. We firstly propose a series of rules for the mapping from VCF and OWL files to JSON files, and then present rule-based algorithms for transforming VCF-based genetic variants and OWL-based biological ontologies into JSON objects. On this basis, we introduce effective approaches of integrating the mapped JSON files in MongoDB. Finally, we complement this work with a set of experiments to show the performance of our proposed approaches. The source code of the proposed approaches could be freely available at https://github.com/lyotvincent/AJIA.
下一代测序(NGS)技术的发展产生了大量的 VCF(变体调用格式)文件,这些文件是与 1000 基因组计划一起开发的标准格式。与此同时,随着生物医学本体在生物医学领域的广泛应用,越来越多的应用程序接受 Web 本体语言(OWL)作为生物医学本体描述规范的主要数据格式,这导致基于 OWL 的生物医学本体规模迅速增长。在本文中,我们试图探索一种使用 MongoDB 数据库管理基于 VCF 的遗传变异和基于 OWL 的生物本体的有效方法。考虑到许多当前的应用程序(例如短遗传变异数据库 dbSNP 等)正在通过使用 JSON(JavaScript 对象表示法)进行新的设计,以支持未来大规模数据扩展和交换。我们首先提出了一系列从 VCF 和 OWL 文件到 JSON 文件的映射规则,然后提出了基于规则的算法,将基于 VCF 的遗传变异和基于 OWL 的生物本体转换为 JSON 对象。在此基础上,我们介绍了将映射的 JSON 文件集成到 MongoDB 中的有效方法。最后,我们通过一组实验来补充这项工作,以展示我们提出的方法的性能。所提出方法的源代码可以在 https://github.com/lyotvincent/AJIA 上免费获得。