Suppr超能文献

通过许可区块链记账进行去中心化基因组学审计日志记录。

Decentralized genomics audit logging via permissioned blockchain ledgering.

机构信息

Sandia National Laboratories, Albuquerque, NM, USA.

Sandia National Laboratories, Livermore, CA, USA.

出版信息

BMC Med Genomics. 2020 Jul 21;13(Suppl 7):102. doi: 10.1186/s12920-020-0720-3.

Abstract

BACKGROUND

One of the tasks in the iDASH Secure Genome Analysis Competition in 2018 was to develop blockchain-based immutable logging and querying for a cross-site genomic dataset access audit trail. The specific challenge was to design a time/space efficient structure and mechanism of storing/retrieving genomic data access logs, based on MultiChain version 1.0.4 ( https://www.multichain.com/ ).

METHODS

Our technique uses the MultiChain stream application programming interface (which affords treating MultiChain as a key value store) and employs a two-level index, which naturally supports efficient queries of the data for single clause constraints. The scheme also supports heuristic and binary search techniques for queries containing conjunctions of clause constraints, and timestamp range queries. Of note, all of our techniques have complexity independent of inserted data set size, other than the timestamp ranges, which logarithmically scale with input size.

RESULTS

We implemented our insertion and querying techniques in Python, using the MultiChain library Savoir ( https://github.com/dxmarkets/savoir ), and comprehensively tested our implementation across a benchmark of datasets of varying sizes. We also tested a port of our challenge submission to a newer version of MultiChain (2.0 beta), which natively supports multiple indices.

CONCLUSIONS

We presented creative and efficient techniques for storing and querying log file data in MultiChain 1.0.4 and 2.0 beta. We demonstrated that it is feasible to use a permissioned blockchain ledger for genomic query log data when data volume is on the order of hundreds of megabytes and query times of dozens of minutes is acceptable. We demonstrated that evolution in the ledger platform (MultiChain 1 to 2) yielded a 30%-40% increase in insertion efficiency. All source code for this challenge has been made available under a BSD-3 license from https://github.com/sandialabs/idash2018task1/ .

摘要

背景

2018 年 iDASH 安全基因组分析竞赛的任务之一是开发基于区块链的不可变日志记录和查询,以用于跨站点基因组数据集访问审核跟踪。具体挑战是设计一种基于 MultiChain 版本 1.0.4(https://www.multichain.com/)的存储/检索基因组数据访问日志的高效结构和机制。

方法

我们的技术使用 MultiChain 流应用程序编程接口(允许将 MultiChain 视为键值存储),并采用两级索引,该索引自然支持对单条约束条件的数据进行高效查询。该方案还支持查询中包含多个约束条件的连接的启发式和二进制搜索技术,以及时间戳范围查询。值得注意的是,除了与输入大小对数成比例的时间戳范围之外,我们所有的技术的复杂性都与插入数据集的大小无关。

结果

我们使用 MultiChain 库 Savoir(https://github.com/dxmarkets/savoir)在 Python 中实现了插入和查询技术,并在各种大小的数据集基准测试中全面测试了我们的实现。我们还测试了我们的挑战提交的新版本 MultiChain(2.0 beta)的端口,该版本本机支持多个索引。

结论

我们提出了在 MultiChain 1.0.4 和 2.0 beta 中存储和查询日志文件数据的创新和高效技术。我们证明了当数据量达到数百兆字节且查询时间可接受数十分钟时,使用受权限控制的区块链分类帐即可实现对基因组查询日志数据的查询。我们证明了分类帐平台(MultiChain 1 到 2)的发展带来了 30%-40%的插入效率提高。此挑战的所有源代码都可以根据 BSD-3 许可证从 https://github.com/sandialabs/idash2018task1/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b24e/7372871/f43b29f62c26/12920_2020_720_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验