Department of Human Development and Family Studies, Pennsylvania State University, University Park.
Department of Psychology, Humboldt-Universität zu Berlin, Germany.
J Gerontol B Psychol Sci Soc Sci. 2017 Dec 15;73(1):113-123. doi: 10.1093/geronb/gbx008.
As diary, panel, and experience sampling methods become easier to implement, studies of development and aging are adopting more and more intensive study designs. However, if too many measures are included in such designs, interruptions for measurement may constitute a significant burden for participants. We propose the use of feature selection-a data-driven machine learning process-in study design and selection of measures that show the most predictive power in pilot data.
We introduce an analytical paradigm based on the feature importance estimation and recursive feature elimination with decision tree ensembles and illustrate its utility using empirical data from the German Socio-Economic Panel (SOEP).
We identified a subset of 20 measures from the SOEP data set that maintain much of the ability of the original data set to predict life satisfaction and health across younger, middle, and older age groups.
Feature selection techniques permit researchers to choose measures that are maximally predictive of relevant outcomes, even when there are interactions or nonlinearities. These techniques facilitate decisions about which measures may be dropped from a study while maintaining efficiency of prediction across groups and reducing costs to the researcher and burden on the participants.
随着日记、面板和经验采样方法变得更容易实施,发展和衰老的研究越来越多地采用更密集的研究设计。然而,如果在这样的设计中包含太多的措施,测量的中断可能会对参与者构成重大负担。我们建议在研究设计和措施选择中使用特征选择 - 一种基于数据的机器学习过程,该过程在试点数据中显示出最具预测能力的措施。
我们引入了一种基于特征重要性估计和递归特征消除的分析范例,该范例基于决策树集成,并使用德国社会经济面板(SOEP)的实证数据说明了其效用。
我们从 SOEP 数据集中确定了一个 20 个指标的子集,这些指标在预测年轻、中年和老年组的生活满意度和健康方面保留了原始数据集的大部分能力。
特征选择技术允许研究人员选择对相关结果具有最大预测能力的措施,即使存在交互作用或非线性关系。这些技术有助于决定在保持组间预测效率的同时,可以从研究中删除哪些措施,同时降低研究人员的成本和参与者的负担。