Suppr超能文献

项目反应理论作为机器学习中特征选择和解释的工具。

Item response theory as a feature selection and interpretation tool in the context of machine learning.

机构信息

Department of Biomedical Engineering, University of Calgary, Calgary, AB, Canada.

Undergraduate Medical Education, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.

出版信息

Med Biol Eng Comput. 2021 Feb;59(2):471-482. doi: 10.1007/s11517-020-02301-x. Epub 2021 Feb 3.

Abstract

Optimizing the number and utility of features to use in a classification analysis has been the subject of many research studies. Most current models use end-classifications as part of the feature reduction process, leading to circularity in the methodology. The approach demonstrated in the present research uses item response theory (IRT) to select features independent of the end-classification results without the biased accuracies that this circularity engenders. Dichotomous and polytomous IRT models were used to analyze 30 histological breast cancer features from 569 patients using the Wisconsin Diagnostic Breast Cancer data set. Based on their characteristics, three features were selected for use in a machine learning classifier. For comparison purposes, two machine learning-based feature selection protocols were run-recursive feature elimination (RFE) and ridge regression-and the three features selected from these analyses were also used in the subsequent learning classifier. Classification results demonstrated that all three selection processes performed comparably. The non-biased nature of the IRT protocol and information provided about the specific characteristics of the features as to why they are of use in classification help to shed light on understanding which attributes of features make them suitable for use in a machine learning context.

摘要

优化分类分析中使用的特征数量和效用一直是许多研究的主题。大多数现有模型将终端分类用作特征减少过程的一部分,导致方法学中的循环。本研究中展示的方法使用项目反应理论(IRT)在不产生这种循环的有偏差准确性的情况下,独立于终端分类结果选择特征。二项式和多项式 IRT 模型用于使用威斯康星州诊断乳腺癌数据集分析来自 569 名患者的 30 个乳腺癌组织学特征。基于其特征,选择了三个特征用于机器学习分类器。出于比较目的,运行了两种基于机器学习的特征选择协议——递归特征消除(RFE)和岭回归——并在后续学习分类器中使用了这些分析中选择的三个特征。分类结果表明,所有三个选择过程的性能相当。IRT 协议的无偏性质以及关于特征为何在分类中有用的特定特征的信息提供有助于阐明理解哪些特征属性使其适合在机器学习上下文中使用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验