Penha Emanuel Diego S, Iriabho Egiebade, Dussaq Alex, de Oliveira Diana Magalhães, Almeida Jonas S
Department of Pathology, Informatics Division, University of Alabama at Birmingham, Birmingham, AL 35233, USA.
Rede Nordeste de Biotecnologia, Universidade Estadual do Ceará, Fortaleza CE 60740-000, Brazil.
Bioinformatics. 2017 Feb 15;33(4):547-548. doi: 10.1093/bioinformatics/btw652.
The move of computational genomics workflows to Cloud Computing platforms is associated with a new level of integration and interoperability that challenges existing data representation formats. The Variant Calling Format (VCF) is in a particularly sensitive position in that regard, with both clinical and consumer-facing analysis tools relying on this self-contained description of genomic variation in Next Generation Sequencing (NGS) results. In this report we identify an isomorphic map between VCF and the reference Resource Description Framework. RDF is advanced by the World Wide Web Consortium (W3C) to enable representations of linked data that are both distributed and discoverable. The resulting ability to decompose VCF reports of genomic variation without loss of context addresses the need to modularize and govern NGS pipelines for Precision Medicine. Specifically, it provides the flexibility (i.e. the indexing) needed to support the wide variety of clinical scenarios and patient-facing governance where only part of the VCF data is fitting.
Software libraries with a claim to be both domain-facing and consumer-facing have to pass the test of portability across the variety of devices that those consumers in fact adopt. That is, ideally the implementation should itself take place within the space defined by web technologies. Consequently, the isomorphic mapping function was implemented in JavaScript, and was tested in a variety of environments and devices, client and server side alike. These range from web browsers in mobile phones to the most popular micro service platform, NodeJS. The code is publicly available at https://github.com/ibl/VCFr , with a live deployment at: http://ibl.github.io/VCFr/ .
计算基因组学工作流程向云计算平台的迁移伴随着新的集成和互操作性水平,这对现有的数据表示格式提出了挑战。在这方面,变异调用格式(VCF)处于特别敏感的位置,临床和面向消费者的分析工具都依赖于这种对下一代测序(NGS)结果中基因组变异的自包含描述。在本报告中,我们确定了VCF与参考资源描述框架之间的同构映射。资源描述框架(RDF)由万维网联盟(W3C)提出,用于实现对既分布式又可发现的链接数据的表示。由此产生的在不丢失上下文的情况下分解基因组变异的VCF报告的能力,满足了为精准医学对NGS管道进行模块化和管理的需求。具体而言,它提供了支持各种临床场景和面向患者的管理所需的灵活性(即索引),在这些场景中只有部分VCF数据适用。
声称既面向领域又面向消费者的软件库必须通过在消费者实际采用的各种设备上的可移植性测试。也就是说,理想情况下,实现本身应该在网络技术定义的空间内进行。因此,同构映射函数用JavaScript实现,并在各种环境和设备上进行了测试,包括客户端和服务器端。这些环境和设备从手机中的网页浏览器到最流行的微服务平台NodeJS。代码可在https://github.com/ibl/VCFr上公开获取,实时部署在:http://ibl.github.io/VCFr/ 。