• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TKFIM:基于等价类的Top-K频繁项集挖掘技术。

TKFIM: Top-K frequent itemset mining technique based on equivalence classes.

作者信息

Iqbal Saood, Shahid Abdul, Roman Muhammad, Khan Zahid, Al-Otaibi Shaha, Yu Lisu

机构信息

Institute of Computing, Kohat University of Science & Technology, Kohat, Kohat, KPK, Pakistan.

Robotics and Internet of Things Lab, Prince Sultan University, Riyadh, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2021 Mar 8;7:e385. doi: 10.7717/peerj-cs.385. eCollection 2021.

DOI:10.7717/peerj-cs.385
PMID:33817031
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7959650/
Abstract

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset's characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.

摘要

频繁项挖掘是数据挖掘研究中的一个重要课题。在过去十年中,由于创新发展,数据量呈指数级增长。对于频繁项集(FI)挖掘应用来说,这带来了新的挑战。在最近的算法中可能会发现误解信息,包括基于阈值和大小的算法。阈值在从给定数据集中生成频繁项集时起着核心作用。对于那些不了解数据集特征的人来说,选择支持阈值非常复杂。然而,没有支持阈值的频繁项集查找算法由于计算量过大,性能存在缺陷。因此,我们提出了一种无需支持阈值来发现频繁项集的方法,称为Top-k频繁项集挖掘(TKFIM)。它使用类等价和集合论概念来挖掘频繁项集。所提出的过程不会遗漏任何频繁项集;因此,可以挖掘出准确的频繁模式。此外,将结果与最新技术进行了比较,如Top-k挖掘器和一次构建一次挖掘(BOMO)。结果发现,所提出的TKFIM在执行和性能方面优于这些方法的结果,在使用国际象棋、蘑菇、连接和T1014D100K数据集时,相对于Top-k挖掘器分别实现了92.70%、35.87%、28.53%和81.27%的性能提升。同样,在使用国际象棋、蘑菇、连接和T1014D100K数据集时,相对于BOMO分别实现了97.14%、100%、78.10%和99.70%的性能提升。因此,可以认为所提出的过程可以应用于大型数据集以获得更好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/42d653adb5e6/peerj-cs-07-385-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/25deb16d9c6d/peerj-cs-07-385-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/91ace2453c30/peerj-cs-07-385-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/27292906c557/peerj-cs-07-385-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/2169a77e8a24/peerj-cs-07-385-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/641060fa120b/peerj-cs-07-385-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/f954f94dbab2/peerj-cs-07-385-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/2e8290cf0bad/peerj-cs-07-385-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/42d653adb5e6/peerj-cs-07-385-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/25deb16d9c6d/peerj-cs-07-385-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/91ace2453c30/peerj-cs-07-385-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/27292906c557/peerj-cs-07-385-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/2169a77e8a24/peerj-cs-07-385-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/641060fa120b/peerj-cs-07-385-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/f954f94dbab2/peerj-cs-07-385-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/2e8290cf0bad/peerj-cs-07-385-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e625/7959650/42d653adb5e6/peerj-cs-07-385-g008.jpg

相似文献

1
TKFIM: Top-K frequent itemset mining technique based on equivalence classes.TKFIM:基于等价类的Top-K频繁项集挖掘技术。
PeerJ Comput Sci. 2021 Mar 8;7:e385. doi: 10.7717/peerj-cs.385. eCollection 2021.
2
Efficient Top-K Identical Frequent Itemsets Mining without Support Threshold Parameter from Transactional Datasets Produced by IoT-Based Smart Shopping Carts.从基于物联网的智能购物车生成的事务性数据集高效挖掘无支持阈值参数的 Top-K 相同频繁项集。
Sensors (Basel). 2022 Oct 21;22(20):8063. doi: 10.3390/s22208063.
3
The Mining Algorithm of Maximum Frequent Itemsets Based on Frequent Pattern Tree.基于频繁模式树的最大频繁项集挖掘算法。
Comput Intell Neurosci. 2022 May 18;2022:7022168. doi: 10.1155/2022/7022168. eCollection 2022.
4
Marginal frequent itemset mining for fault prevention of railway overhead contact system.用于铁路架空接触网系统故障预防的边际频繁项集挖掘
ISA Trans. 2022 Jul;126:276-287. doi: 10.1016/j.isatra.2021.07.018. Epub 2021 Jul 13.
5
An efficient pattern growth approach for mining fault tolerant frequent itemsets.一种用于挖掘容错频繁项集的高效模式增长方法。
Expert Syst Appl. 2020 Apr 1;143:113046. doi: 10.1016/j.eswa.2019.113046. Epub 2019 Oct 21.
6
Quantifying the informativeness for biomedical literature summarization: An itemset mining method.量化生物医学文献摘要的信息量:一种基于项集挖掘的方法。
Comput Methods Programs Biomed. 2017 Jul;146:77-89. doi: 10.1016/j.cmpb.2017.05.011. Epub 2017 May 27.
7
On Differentially Private Frequent Itemset Mining.关于差分隐私频繁项集挖掘
VLDB J. 2012 Nov 1;6(1):25-36. doi: 10.14778/2428536.2428539.
8
A novel association rule mining approach using TID intermediate itemset.一种使用事务标识(TID)中间项集的新型关联规则挖掘方法。
PLoS One. 2018 Jan 19;13(1):e0179703. doi: 10.1371/journal.pone.0179703. eCollection 2018.
9
Diagnosis of coronary artery disease using an efficient hash table based closed frequent itemsets mining.使用基于高效哈希表的封闭频繁项集挖掘技术诊断冠状动脉疾病。
Med Biol Eng Comput. 2018 May;56(5):749-759. doi: 10.1007/s11517-017-1719-6. Epub 2017 Sep 14.
10
Mining differential top-k co-expression patterns from time course comparative gene expression datasets.从时间序列比较基因表达数据集中挖掘差异的 top-k 共表达模式。
BMC Bioinformatics. 2013 Jul 21;14:230. doi: 10.1186/1471-2105-14-230.

引用本文的文献

1
Efficient Top-K Identical Frequent Itemsets Mining without Support Threshold Parameter from Transactional Datasets Produced by IoT-Based Smart Shopping Carts.从基于物联网的智能购物车生成的事务性数据集高效挖掘无支持阈值参数的 Top-K 相同频繁项集。
Sensors (Basel). 2022 Oct 21;22(20):8063. doi: 10.3390/s22208063.