Cannataro Mario, Congiusta Antonio, Pugliese Andrea, Talia Domenico, Trunfio Paolo
Università di Catanzaro, 88100 Catanzaro, Italy.
IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2451-65. doi: 10.1109/tsmcb.2004.836890.
Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.
数据挖掘算法如今被广泛应用于分析存储在数据库和数据存档中的大型企业和科学数据集。工业、科学和商业领域经常需要通过使用分布式和并行系统的计算能力来分析在地理上分布的站点所维护的非常大的数据集。网格在为分布式知识发现应用提供有效的计算支持方面可以发挥重要作用。为了在网格上开发数据挖掘应用,我们设计了一个名为知识网格的系统。本文描述了知识网格框架,并展示了知识网格为实现分布式知识发现而提供的工具集。本文讨论了如何通过使用知识网格工具来设计和实现数据挖掘应用,从搜索网格资源、组合软件和数据组件,到在网格上执行最终的数据挖掘过程。还讨论了一些性能结果。