Program in Computational Biology and Bioinformatics, Yale University, Whitney Avenue, New Haven, 06520, CT, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, Whitney Avenue, New Haven, 06520, CT, USA.
BMC Med Genomics. 2020 Jun 1;13(1):74. doi: 10.1186/s12920-020-00732-x.
As pharmacogenomics data becomes increasingly integral to clinical treatment decisions, appropriate data storage and sharing protocols need to be adopted. One promising option for secure, high-integrity storage and sharing is Ethereum smart contracts. Ethereum is a blockchain platform, and smart contracts are immutable pieces of code running on virtual machines in this platform that can be invoked by a user or another contract (in the blockchain network). The 2019 iDASH (Integrating Data for Analysis, Anonymization, and Sharing) competition for Secure Genome Analysis challenged participants to develop time- and space-efficient Ethereum smart contracts for gene-drug relationship data.
Here we design a specific smart contract to store and query gene-drug interactions in Ethereum using an index-based, multi-mapping approach. Our contract stores each pharmacogenomics observation, a gene-variant-drug triplet with outcome, in a mapping searchable by a unique identifier, allowing for time and space efficient storage and query. This solution ranked in the top three at the 2019 IDASH competition. We further improve our "challenge solution" and develop an alternate "fastQuery" smart contract, which combines together identical gene-variant-drug combinations into a single storage entry, leading to significantly better scalability and query efficiency.
On a private, proof-of-authority network, both our challenge and fastQuery solutions exhibit approximately linear memory and time usage for inserting into and querying small databases (<1,000 entries). For larger databases (1000 to 10,000 entries), fastQuery maintains this scaling. Furthermore, both solutions can query by a single field ("0-AND") or a combination of fields ("1- or 2-AND"). Specifically, the challenge solution can complete a 2-AND query from a small database (100 entries) in 35ms using 0.1 MB of memory. For the same query, fastQuery has a 2-fold improvement in time and a 10-fold improvement in memory.
We show that pharmacogenomics data can be stored and queried efficiently using Ethereum blockchain. Our solutions could potentially be used to store a range of clinical data and extended to other fields requiring high-integrity data storage and efficient access.
随着药物基因组学数据越来越成为临床治疗决策的重要组成部分,需要采用适当的数据存储和共享协议。一种安全、高完整性存储和共享的有前途的选择是以太坊智能合约。以太坊是一个区块链平台,智能合约是在该平台的虚拟机上运行的不可变代码片段,可以由用户或另一个合约(在区块链网络中)调用。2019 年 iDASH(用于分析、匿名化和共享的数据集成)安全基因组分析竞赛要求参与者开发用于基因-药物关系数据的高效、节省时间和空间的以太坊智能合约。
在这里,我们使用基于索引的多映射方法设计了一个特定的智能合约,用于在以太坊中存储和查询基因-药物相互作用。我们的合约使用唯一标识符可搜索的映射存储每个药物基因组学观察结果,即基因-变体-药物三元组及其结果,从而实现高效的存储和查询。该解决方案在 2019 年的 IDASH 竞赛中排名前三。我们进一步改进了我们的“挑战解决方案”,并开发了一种替代的“快速查询”智能合约,该合约将相同的基因-变体-药物组合合并到单个存储项中,从而显著提高了可扩展性和查询效率。
在私有、权威证明网络中,我们的挑战和快速查询解决方案在插入和查询小数据库(<1000 个条目)时都表现出近似线性的内存和时间使用。对于更大的数据库(1000 到 10000 个条目),快速查询仍保持这种扩展。此外,两种解决方案都可以通过单个字段(“0-AND”)或多个字段(“1-或 2-AND”)进行查询。具体来说,挑战解决方案可以在 35ms 内使用 0.1MB 的内存完成来自小数据库(100 个条目)的 2-AND 查询。对于相同的查询,快速查询在时间上提高了 2 倍,在内存上提高了 10 倍。
我们表明,药物基因组学数据可以使用以太坊区块链高效存储和查询。我们的解决方案可用于存储各种临床数据,并扩展到需要高完整性数据存储和高效访问的其他领域。