• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高度可扩展且稳健的规则学习器:性能评估与比较

Highly scalable and robust rule learner: performance evaluation and comparison.

作者信息

Kurgan Lukasz A, Cios Krzysztof J, Dick Scott

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton AB T6G 2VF, Canada.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2006 Feb;36(1):32-53. doi: 10.1109/tsmcb.2005.852983.

DOI:10.1109/tsmcb.2005.852983
PMID:16468565
Abstract

Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.

摘要

商业智能和生物信息学应用越来越需要挖掘由数百万个数据点组成的数据集,或者为大型公司和制药公司构建实时企业级决策支持系统。在所有这些情况下,都需要一个基础的数据挖掘系统,并且这个挖掘系统必须具有高度的可扩展性。为此,我们描述了一种名为DataSqueezer的新规则学习器。该学习器属于归纳监督规则提取算法家族。DataSqueezer是一个简单、贪婪的规则构建器,它从带标签的输入数据中生成一组生产规则。尽管相对简单,但DataSqueezer是一个非常有效的学习器。该算法生成的规则紧凑、易懂,并且与其他最先进的规则提取算法生成的规则具有相当的准确性。DataSqueezer的主要优点是效率非常高以及抗缺失数据。DataSqueezer随着训练示例数量的增加呈现对数线性渐近复杂度,并且比其他最先进的规则学习器更快。通过与其他学习器的广泛实验比较验证,该学习器对大量缺失数据也具有鲁棒性。因此,DataSqueezer非常适合现代数据挖掘和商业智能任务,这些任务通常涉及包含大量缺失数据的巨大数据集。

相似文献

1
Highly scalable and robust rule learner: performance evaluation and comparison.高度可扩展且稳健的规则学习器:性能评估与比较
IEEE Trans Syst Man Cybern B Cybern. 2006 Feb;36(1):32-53. doi: 10.1109/tsmcb.2005.852983.
2
Fuzzy versus quantitative association rules: a fair data-driven comparison.模糊关联规则与定量关联规则:基于数据驱动的公平比较
IEEE Trans Syst Man Cybern B Cybern. 2006 Jun;36(3):679-84. doi: 10.1109/tsmcb.2005.860134.
3
Association rule mining in peer-to-peer systems.对等网络系统中的关联规则挖掘。
IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2426-38. doi: 10.1109/tsmcb.2004.836888.
4
Rule mining and classification in a situation assessment application: a belief-theoretic approach for handling data imperfections.态势评估应用中的规则挖掘与分类:一种处理数据不完美性的信念理论方法
IEEE Trans Syst Man Cybern B Cybern. 2007 Dec;37(6):1446-59. doi: 10.1109/tsmcb.2007.903536.
5
PARM--an efficient algorithm to mine association rules from spatial data.PARM——一种从空间数据中挖掘关联规则的高效算法。
IEEE Trans Syst Man Cybern B Cybern. 2008 Dec;38(6):1513-24. doi: 10.1109/TSMCB.2008.927730.
6
Scaling genetic programming to large datasets using hierarchical dynamic subset selection.使用分层动态子集选择将遗传编程扩展到大型数据集。
IEEE Trans Syst Man Cybern B Cybern. 2007 Aug;37(4):1065-73. doi: 10.1109/tsmcb.2007.896406.
7
Minerva: sequential covering for rule extraction.密涅瓦:用于规则提取的顺序覆盖法。
IEEE Trans Syst Man Cybern B Cybern. 2008 Apr;38(2):299-309. doi: 10.1109/TSMCB.2007.912079.
8
Evaluation of biomedical text-mining systems: lessons learned from information retrieval.生物医学文本挖掘系统的评估:从信息检索中汲取的经验教训。
Brief Bioinform. 2005 Dec;6(4):344-56. doi: 10.1093/bib/6.4.344.
9
Distributed data mining on grids: services, tools, and applications.网格上的分布式数据挖掘:服务、工具与应用。
IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2451-65. doi: 10.1109/tsmcb.2004.836890.
10
Multiple objective evolutionary algorithm for temporal linguistic rule extraction.
ISA Trans. 2005 Apr;44(2):315-27. doi: 10.1016/s0019-0578(07)60184-0.

引用本文的文献

1
Improved machine learning method for analysis of gas phase chemistry of peptides.用于分析肽气相化学的改进机器学习方法。
BMC Bioinformatics. 2008 Dec 3;9:515. doi: 10.1186/1471-2105-9-515.