• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于存储和查询集合的数据结构集合前缀树:理论与实证分析。

Data structure set-trie for storing and querying sets: Theoretical and empirical analysis.

作者信息

Savnik Iztok, Akulich Mikita, Krnc Matjaž, Škrekovski Riste

机构信息

Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia.

Faculty of Information Studies, Novo Mesto, Slovenia.

出版信息

PLoS One. 2021 Feb 10;16(2):e0245122. doi: 10.1371/journal.pone.0245122. eCollection 2021.

DOI:10.1371/journal.pone.0245122
PMID:33566827
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7875400/
Abstract

Set containment operations form an important tool in various fields such as information retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a set-trie data structure for storing sets is considered, along with the efficient algorithms for the corresponding set containment operations. We present the mathematical and empirical study of the set-trie. In the mathematical study, the relevant upper-bounds on the efficiency of its expected performance are established by utilizing a natural probabilistic model. In the empirical study, we give insight into how different distributions of input data impact the efficiency of set-trie. Using the correct parameters for those randomly generated datasets, we expose the key sources of the input sensitivity of set-trie. Finally, the empirical comparison of set-trie with the inverted index is based on the real-world datasets containing sets of low cardinality. The comparison shows that the running time of set-trie consistently outperforms the inverted index by orders of magnitude.

摘要

集合包含操作在诸如信息检索、人工智能系统、对象关系数据库和互联网应用等各个领域中构成了一个重要工具。在本文中,我们考虑了一种用于存储集合的集合字典树数据结构,以及针对相应集合包含操作的高效算法。我们给出了对集合字典树的数学和实证研究。在数学研究中,通过利用一种自然概率模型建立了其预期性能效率的相关上界。在实证研究中,我们深入了解输入数据的不同分布如何影响集合字典树的效率。通过为那些随机生成的数据集使用正确的参数,我们揭示了集合字典树输入敏感性的关键来源。最后,基于包含低基数集合的真实世界数据集,对集合字典树与倒排索引进行了实证比较。比较结果表明,集合字典树的运行时间始终比倒排索引快几个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/aebbbb9a4bcb/pone.0245122.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/b16fbb66e6ea/pone.0245122.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/2c683876b71f/pone.0245122.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/423a0c9ef40d/pone.0245122.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/6d0527a74d4b/pone.0245122.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/47bc5ba4c8c3/pone.0245122.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/f89e7aeca948/pone.0245122.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/d32b7aa8716a/pone.0245122.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/65e87c9da8e0/pone.0245122.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/6fa5e289484f/pone.0245122.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/d79109be241d/pone.0245122.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/44011ad52564/pone.0245122.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/0932738c27d1/pone.0245122.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/7ce08cc6800b/pone.0245122.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/aebbbb9a4bcb/pone.0245122.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/b16fbb66e6ea/pone.0245122.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/2c683876b71f/pone.0245122.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/423a0c9ef40d/pone.0245122.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/6d0527a74d4b/pone.0245122.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/47bc5ba4c8c3/pone.0245122.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/f89e7aeca948/pone.0245122.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/d32b7aa8716a/pone.0245122.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/65e87c9da8e0/pone.0245122.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/6fa5e289484f/pone.0245122.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/d79109be241d/pone.0245122.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/44011ad52564/pone.0245122.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/0932738c27d1/pone.0245122.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/7ce08cc6800b/pone.0245122.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0666/7875400/aebbbb9a4bcb/pone.0245122.g014.jpg

相似文献

1
Data structure set-trie for storing and querying sets: Theoretical and empirical analysis.用于存储和查询集合的数据结构集合前缀树:理论与实证分析。
PLoS One. 2021 Feb 10;16(2):e0245122. doi: 10.1371/journal.pone.0245122. eCollection 2021.
2
Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.基于 Trie 的规则处理在临床自然语言处理中的应用:n-trie 的使用案例研究,使 ConText 算法更高效、更具可扩展性。
J Biomed Inform. 2018 Sep;85:106-113. doi: 10.1016/j.jbi.2018.08.002. Epub 2018 Aug 6.
3
Hierarchical trie packet classification algorithm based on expectation-maximization clustering.基于期望最大化聚类的层次化前缀树分组分类算法
PLoS One. 2017 Jul 13;12(7):e0181049. doi: 10.1371/journal.pone.0181049. eCollection 2017.
4
An efficient algorithm for mining closed itemsets.一种挖掘封闭项集的高效算法。
J Zhejiang Univ Sci. 2004 Jan;5(1):8-15. doi: 10.1007/BF02839306.
5
On optimizing syntactic pattern recognition using tries and AI-based heuristic-search strategies.
IEEE Trans Syst Man Cybern B Cybern. 2006 Jun;36(3):611-22. doi: 10.1109/tsmcb.2005.861860.
6
Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.布隆过滤器前缀树:一种用于泛基因组存储的无比对和无参考的数据结构。
Algorithms Mol Biol. 2016 Apr 14;11:3. doi: 10.1186/s13015-016-0066-8. eCollection 2016.
7
Dimensionality reduction of clustered data sets.聚类数据集的降维
IEEE Trans Pattern Anal Mach Intell. 2008 Mar;30(3):535-40. doi: 10.1109/TPAMI.2007.70819.
8
Kernel entropy component analysis.核熵分量分析。
IEEE Trans Pattern Anal Mach Intell. 2010 May;32(5):847-60. doi: 10.1109/TPAMI.2009.100.
9
MergedTrie: Efficient textual indexing.合并 Trie:高效的文本索引。
PLoS One. 2019 Apr 23;14(4):e0215288. doi: 10.1371/journal.pone.0215288. eCollection 2019.
10
Fast graph-based relaxed clustering for large data sets using minimal enclosing ball.基于快速图的使用最小包围球的大数据集松弛聚类
IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):672-87. doi: 10.1109/TSMCB.2011.2172604. Epub 2012 Feb 3.