de Lacy Nina, Ramshaw Michael J, Kutz J Nathan
de Lacy Laboratory, Department of Psychiatry, Huntsman Mental Health Institute, University of Utah, Salt Lake City, UT, United States.
Department of Applied Mathematics, AI Institute in Dynamic Systems, University of Washington, Seattle, WA, United States.
Front Artif Intell. 2022 Apr 5;5:832530. doi: 10.3389/frai.2022.832530. eCollection 2022.
Artificial intelligence and machine learning techniques have proved fertile methods for attacking difficult problems in medicine and public health. These techniques have garnered strong interest for the analysis of the large, multi-domain open science datasets that are increasingly available in health research. Discovery science in large datasets is challenging given the unconstrained nature of the learning environment where there may be a large number of potential predictors and appropriate ranges for model hyperparameters are unknown. As well, it is likely that explainability is at a premium in order to engage in future hypothesis generation or analysis. Here, we present a novel method that addresses these challenges by exploiting evolutionary algorithms to optimize machine learning discovery science while exploring a large solution space and minimizing bias. We demonstrate that our approach, called (IEL), provides an automated, adaptive method for jointly learning features and hyperparameters while furnishing explainable models where the original features used to make predictions may be obtained even with artificial neural networks. In IEL the machine learning algorithm of choice is nested inside an evolutionary algorithm which selects features hyperparameters over generations on the basis of an information function to converge on an optimal solution. We apply IEL to three gold standard machine learning algorithms in challenging, heterogenous biobehavioral data: deep learning with artificial neural networks, decision tree-based techniques and baseline linear models. Using our novel IEL approach, artificial neural networks achieved ≥ 95% accuracy, sensitivity and specificity and 45-73% in classification and substantial gains over default settings. IEL may be applied to a wide range of less- or unconstrained discovery science problems where the practitioner wishes to jointly learn features and hyperparameters in an adaptive, principled manner within the same algorithmic process. This approach offers significant flexibility, enlarges the solution space and mitigates bias that may arise from manual or semi-manual hyperparameter tuning and feature selection and presents the opportunity to select the inner machine learning algorithm based on the results of optimized learning for the problem at hand.
人工智能和机器学习技术已被证明是攻克医学和公共卫生领域难题的有效方法。这些技术对于分析健康研究中越来越多可用的大型多领域开放科学数据集引起了浓厚兴趣。鉴于学习环境的无约束性质,大型数据集中的发现科学具有挑战性,在这种环境中可能存在大量潜在预测变量,且模型超参数的合适范围未知。此外,为了进行未来的假设生成或分析,可解释性可能至关重要。在此,我们提出一种新颖的方法,通过利用进化算法来优化机器学习发现科学,同时探索大型解空间并最小化偏差,从而应对这些挑战。我们证明,我们的方法称为(IEL),提供了一种自动、自适应的方法,用于联合学习特征和超参数,同时提供可解释的模型,即使使用人工神经网络,也可以获得用于进行预测的原始特征。在IEL中,所选择的机器学习算法嵌套在进化算法中,进化算法基于信息函数在几代中选择特征和超参数,以收敛到最优解。我们将IEL应用于具有挑战性的异质生物行为数据中的三种黄金标准机器学习算法:人工神经网络深度学习、基于决策树的技术和基线线性模型。使用我们新颖的IEL方法,人工神经网络在分类中实现了≥95%的准确率、灵敏度和特异性,以及45 - 73%的 ,并且比默认设置有显著提高。IEL可应用于广泛的较少约束或无约束的发现科学问题,在这些问题中,从业者希望在同一算法过程中以自适应、有原则的方式联合学习特征和超参数。这种方法提供了显著的灵活性,扩大了解空间,并减轻了可能因手动或半手动超参数调整和特征选择而产生的偏差,还提供了根据针对手头问题的优化学习结果选择内部机器学习算法的机会。