Zhang Wei, Yu Yao-Chsi, Li Jr-Shin
Washington University in St. Louis, St. Louis, USA.
Data Min Knowl Discov. 2019 Nov;33(6):1710-1735. doi: 10.1007/s10618-019-00639-x. Epub 2019 Jun 24.
Knowledge discovery and information extraction of large and complex datasets has attracted great attention in wide-ranging areas from statistics and biology to medicine. Tools from machine learning, data mining, and neurocomputing have been extensively explored and utilized to accomplish such compelling data analytics tasks. However, for time-series data presenting active dynamic characteristics, many of the state-of-the-art techniques may not perform well in capturing the inherited temporal structures in these data. In this paper, integrating the Koopman operator and linear dynamical systems theory with support vector machines, we develop an ovel dynamic data mining framework to construct low-dimensional linear models that approximate the nonlinear flow of high-dimensional time-series data generated by unknown nonlinear dynamical systems. This framework then immediately enables pattern recognition, e.g., classification, of complex time-series data to distinguish their dynamic behaviors by using the trajectories generated by the reduced linear systems. Moreover, we demonstrate the applicability and efficiency of this framework through the problems of time-series classification in bioinformatics and healthcare, including cognitive classification and seizure detection with fMRI and EEG data, respectively. The developed Koopman dynamic learning framework then lays a solid foundation for effective dynamic data mining and promises a mathematically justified method for extracting the dynamics and significant temporal structures of nonlinear dynamical systems.
大型复杂数据集的知识发现和信息提取在从统计学、生物学到医学等广泛领域引起了极大关注。来自机器学习、数据挖掘和神经计算的工具已被广泛探索和利用,以完成此类引人注目的数据分析任务。然而,对于呈现活跃动态特征的时间序列数据,许多最先进的技术在捕捉这些数据中固有的时间结构方面可能表现不佳。在本文中,我们将库普曼算子和线性动力系统理论与支持向量机相结合,开发了一个新颖的动态数据挖掘框架,以构建低维线性模型,该模型近似由未知非线性动力系统生成的高维时间序列数据的非线性流。然后,该框架能够立即对复杂的时间序列数据进行模式识别,例如分类,通过使用简化线性系统生成的轨迹来区分它们的动态行为。此外,我们通过生物信息学和医疗保健中的时间序列分类问题,分别包括使用功能磁共振成像(fMRI)和脑电图(EEG)数据进行认知分类和癫痫检测,证明了该框架的适用性和效率。所开发的库普曼动态学习框架为有效的动态数据挖掘奠定了坚实基础,并有望提供一种数学上合理的方法来提取非线性动力系统的动力学和重要时间结构。