Nakagawa Yoshiki, Ohata Satsuya, Shimizu Kana
Department of Computer Science and Engineering, Waseda University, Tokyo, Japan.
Self-employment, Tokyo, Japan.
Algorithms Mol Biol. 2022 Apr 26;17(1):9. doi: 10.1186/s13015-022-00211-1.
The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that [Formula: see text] is computed for a given depth of recursion where [Formula: see text] is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.
开发隐私保护技术对于加速基因组数据共享至关重要。本研究提出了一种算法,该算法可安全地搜索查询与数据库序列之间的可变长度子串匹配。我们的概念基于一种将FM索引有效应用于秘密共享方案的技术。更确切地说,我们开发了一种算法,该算法可以通过以下方式实现安全的表查找:对于给定的递归深度计算[公式:见原文],其中[公式:见原文]是初始位置,V是一个向量。我们将安全表查找用于基于FM索引创建的向量。安全表查找的显著特点是,在查询输入之后,时间、通信和轮次复杂度不依赖于表长度N。因此,通过参考基于FM索引的表进行子串匹配也可以独立于数据库长度进行,并且与以前的方法相比,整个搜索时间有了显著改善。我们使用长度为10 million的人类基因组序列作为数据库,并使用长度为100的查询进行了实验,发现在实际计算/网络环境下,我们协议的查询响应时间比非索引数据库搜索协议至少快三个数量级。