University of California Davis, USA.
University of California Davis, USA.
Soc Sci Res. 2023 Feb;110:102817. doi: 10.1016/j.ssresearch.2022.102817. Epub 2022 Oct 29.
The interdisciplinary field of knowledge discovery and data mining emerged from a necessity of big data requiring new analytical methods beyond the traditional statistical approaches to discover new knowledge from the data mine. This emergent approach is a dialectic research process that is both deductive and inductive. The data mining approach automatically or semi-automatically considers a larger number of joint, interactive, and independent predictors to address causal heterogeneity and improve prediction. Instead of challenging the conventional model-building approach, it plays an important complementary role in improving model goodness of fit, revealing valid and significant hidden patterns in data, identifying nonlinear and non-additive effects, providing insights into data developments, methods, and theory, and enriching scientific discovery. Machine learning builds models and algorithms by learning and improving from data when the explicit model structure is unclear and algorithms with good performance are difficult to attain. The most recent development is to incorporate this new paradigm of predictive modeling with the classical approach of parameter estimation regressions to produce improved models that combine explanation and prediction.
知识发现和数据挖掘的跨学科领域源于大数据的需求,需要新的分析方法,超越传统的统计方法,从数据矿山中发现新知识。这种新出现的方法是一种辩证的研究过程,既是演绎的,也是归纳的。数据挖掘方法自动或半自动地考虑更多的联合、交互和独立的预测因子,以解决因果异质性并提高预测能力。它不是对传统的模型构建方法提出挑战,而是在提高模型拟合优度、揭示数据中的有效和显著隐藏模式、识别非线性和非可加效应、深入了解数据发展、方法和理论以及丰富科学发现方面发挥着重要的补充作用。当明确的模型结构不清楚且难以获得性能良好的算法时,机器学习通过从数据中学习和改进来构建模型和算法。最新的发展是将这种新的预测建模范式与经典的参数估计回归方法相结合,以产生改进的模型,将解释和预测结合起来。