A Romero Roland Albert, Y Deypalan Mariefel Nicole, Mehrotra Suchit, Jungao John Titus, Sheils Natalie E, Manduchi Elisabetta, Moore Jason H
OptumLabs, Minnetonka, 55343, MN, USA.
Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, 90069, CA, USA.
BioData Min. 2022 Jul 26;15(1):15. doi: 10.1186/s13040-022-00300-2.
Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets.
We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.
The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications.
Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance.
Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application.
确定并比较自动化机器学习(AutoML)工具在大型、高度不平衡的医疗保健数据集上的性能。
我们使用2019年之前四个不同时间窗口内的历史匿名管理索赔数据生成了一个大型数据集,这些数据包括人口统计信息和疾病代码标志。然后,我们在这个数据集上训练了三种AutoML工具,以预测2019年的六种不同疾病结果,并根据多个指标评估模型性能。
AutoML工具相较于基线随机森林模型有改进,但彼此之间没有显著差异。所有模型的精确召回曲线下面积都很低,在保持高真阴性率的同时未能预测出真阳性。模型性能与患病率没有直接关系。我们提供了一个具体的用例,来说明如何选择一个能在真阳性率和假阳性率之间实现最佳平衡的阈值,因为这在医疗应用中是一个重要的考虑因素。
医疗保健数据集给AutoML工具带来了几个挑战,包括样本量大、高度不平衡以及可用特征的局限性。提高可扩展性、结合不平衡学习重采样和集成方法以及精心挑选特征是实现更好性能的可能的下一步措施。
在探索的三种工具中,没有一种AutoML工具在预测性能方面始终优于其他工具。本研究中模型的性能表明,在处理医疗索赔数据方面可能仍有改进空间。最后,最佳预测阈值的选择应以具体的实际应用为指导。