Lenert Matthew C, Walsh Colin G
Vanderbilt University, Nashville, TN.
Vanderbilt University Medical Center, Nashville, TN.
AMIA Annu Symp Proc. 2018 Dec 5;2018:1377-1386. eCollection 2018.
Informctticists sometimes attempt to predict chronic healthcare events that are not fully understood. The resulting models often incorporate copious numbers of predictors derived across diverse datasets. This approach may yield desirable performance characteristics, but it sacrifices interpretability and portability. The Bootstrapped Ridge Selector (BoRidge) offers a tool to balance performance with interpretability. Compared to two modern feature selection methods, Bootstrapped LASSO regression (BoLASSO) and a minimal-redundancy-maximal-relevance selector (mRMR), the BoRidge bested them for binary classification on artificially generated data (sensitivity: 0.83, specificity:0.72) versus BoLASSO (sensitivity: 0.1, specificity:1) and mRMR (sensitivity: 0.69, specificity: 0.69). On a dataset used to validate a published suicide risk prediction model, the BoRidge selected an equally precise model to the publication, with far fewer predictors (114 versus the 1,538 used in the published model). The BoRidge has the potential to simplify classification models for complex problems, making them easier to translate and act upon.
信息学家有时试图预测尚未完全了解的慢性医疗事件。由此产生的模型通常包含从各种数据集中得出的大量预测变量。这种方法可能会产生理想的性能特征,但它牺牲了可解释性和可移植性。自举岭选择器(BoRidge)提供了一种在性能与可解释性之间取得平衡的工具。与两种现代特征选择方法——自举套索回归(BoLASSO)和最小冗余最大相关选择器(mRMR)相比,在人工生成的数据上进行二元分类时,BoRidge的表现优于它们(灵敏度:0.83,特异性:0.72),而BoLASSO的灵敏度为0.1,特异性为1,mRMR的灵敏度为0.69,特异性为0.69。在用于验证已发表的自杀风险预测模型的数据集上,BoRidge选择了一个与该出版物同样精确的模型,但预测变量要少得多(114个,而已发表模型中使用了1538个)。BoRidge有潜力简化针对复杂问题的分类模型,使其更易于转化并付诸行动。