Cao Longbing, Zhang Huaifeng, Zhao Yanchang, Luo Dan, Zhang Chengqi
University of Technology, Sydney (UTS), Sydney, NSW, Australia.
IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):699-712. doi: 10.1109/TSMCB.2010.2086060.
Enterprise data mining applications often involve complex data such as multiple large heterogeneous data sources, user preferences, and business impact. In such situations, a single method or one-step mining is often limited in discovering informative knowledge. It would also be very time and space consuming, if not impossible, to join relevant large data sources for mining patterns consisting of multiple aspects of information. It is crucial to develop effective approaches for mining patterns combining necessary information from multiple relevant business lines, catering for real business settings and decision-making actions rather than just providing a single line of patterns. The recent years have seen increasing efforts on mining more informative patterns, e.g., integrating frequent pattern mining with classifications to generate frequent pattern-based classifiers. Rather than presenting a specific algorithm, this paper builds on our existing works and proposes combined mining as a general approach to mining for informative patterns combining components from either multiple data sets or multiple features or by multiple methods on demand. We summarize general frameworks, paradigms, and basic processes for multifeature combined mining, multisource combined mining, and multimethod combined mining. Novel types of combined patterns, such as incremental cluster patterns, can result from such frameworks, which cannot be directly produced by the existing methods. A set of real-world case studies has been conducted to test the frameworks, with some of them briefed in this paper. They identify combined patterns for informing government debt prevention and improving government service objectives, which show the flexibility and instantiation capability of combined mining in discovering informative knowledge in complex data.
企业数据挖掘应用通常涉及复杂数据,如多个大型异构数据源、用户偏好和业务影响。在这种情况下,单一方法或一步式挖掘在发现有用知识方面往往存在局限性。如果要将相关的大型数据源合并以挖掘包含多方面信息的模式,即使并非不可能,也会非常耗时且占用大量空间。开发有效的方法来挖掘结合多个相关业务线必要信息的模式至关重要,这能适应实际业务场景和决策行动,而不仅仅是提供单一的模式。近年来,人们越来越努力挖掘更有用的模式,例如将频繁模式挖掘与分类相结合以生成基于频繁模式的分类器。本文并非介绍一种特定算法,而是基于我们现有的工作,提出组合挖掘作为一种通用方法,用于挖掘通过按需组合来自多个数据集、多个特征或多种方法的组件而形成的有用模式。我们总结了多特征组合挖掘、多源组合挖掘和多方法组合挖掘的通用框架、范式和基本流程。这样的框架可以产生新型的组合模式,如增量聚类模式,而现有方法无法直接生成这些模式。我们进行了一系列实际案例研究来测试这些框架,本文简要介绍了其中一些案例。这些案例识别出了用于为政府债务预防提供信息和改善政府服务目标的组合模式,展示了组合挖掘在复杂数据中发现有用知识方面的灵活性和实例化能力。