• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

IRTCI:用于分类插补的项目反应理论

IRTCI: Item Response Theory for Categorical Imputation.

作者信息

Kline Adrienne, Luo Yuan

机构信息

Department of Surgery, Northwestern University, Chicago, postcode, USA.

Center for Artificial Intelligence, Northwestern Medicine, Chicago, USA.

出版信息

Res Sq. 2024 Jul 2:rs.3.rs-4529519. doi: 10.21203/rs.3.rs-4529519/v1.

DOI:10.21203/rs.3.rs-4529519/v1
PMID:39011102
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11247932/
Abstract

Most datasets suffer from partial or complete missing values, which has downstream limitations on the available models on which to test the data and on any statistical inferences that can be made from the data. Several imputation techniques have been designed to replace missing data with stand in values. The various approaches have implications for calculating clinical scores, model building and model testing. The work showcased here offers a novel means for categorical imputation based on item response theory (IRT) and compares it against several methodologies currently used in the machine learning field including k-nearest neighbors (kNN), multiple imputed chained equations (MICE) and Amazon Web Services (AWS) deep learning method, Datawig. Analyses comparing these techniques were performed on three different datasets that represented ordinal, nominal and binary categories. The data were modified so that they also varied on both the proportion of data missing and the systematization of the missing data. Two different assessments of performance were conducted: accuracy in reproducing the missing values, and predictive performance using the imputed data. Results demonstrated that the new method, Item Response Theory for Categorical Imputation (IRTCI), fared quite well compared to currently used methods, outperforming several of them in many conditions. Given the theoretical basis for the new approach, and the unique generation of probabilistic terms for determining category belonging for missing cells, IRTCI offers a viable alternative to current approaches.

摘要

大多数数据集都存在部分或完全缺失值的问题,这对可用于测试数据的现有模型以及可从数据中得出的任何统计推断都有下游限制。已经设计了几种插补技术,用替代值来替换缺失数据。各种方法对临床评分的计算、模型构建和模型测试都有影响。这里展示的工作提供了一种基于项目反应理论(IRT)进行分类插补的新方法,并将其与机器学习领域目前使用的几种方法进行比较,包括k近邻(kNN)、多重插补链式方程(MICE)和亚马逊网络服务(AWS)深度学习方法Datawig。在代表有序、名义和二元类别的三个不同数据集上对这些技术进行了比较分析。对数据进行了修改,使其在缺失数据的比例和缺失数据的系统化方面也有所不同。进行了两种不同的性能评估:重现缺失值的准确性,以及使用插补数据的预测性能。结果表明,与目前使用的方法相比,新方法——分类插补项目反应理论(IRTCI)表现相当出色,在许多情况下优于其中几种方法。鉴于新方法的理论基础,以及为确定缺失单元格的类别归属而独特生成的概率项,IRTCI为当前方法提供了一个可行的替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/582a/11247932/f0990a735d10/nihpp-rs4529519v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/582a/11247932/f0990a735d10/nihpp-rs4529519v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/582a/11247932/f0990a735d10/nihpp-rs4529519v1-f0001.jpg

相似文献

1
IRTCI: Item Response Theory for Categorical Imputation.IRTCI:用于分类插补的项目反应理论
Res Sq. 2024 Jul 2:rs.3.rs-4529519. doi: 10.21203/rs.3.rs-4529519/v1.
2
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
3
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
4
NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data.NS-kNN:一种改进的 k-最近邻方法,用于代谢组学数据插补。
Metabolomics. 2018 Nov 23;14(12):153. doi: 10.1007/s11306-018-1451-8.
5
A Workflow for Missing Values Imputation of Untargeted Metabolomics Data.非靶向代谢组学数据缺失值插补的工作流程
Metabolites. 2020 Nov 26;10(12):486. doi: 10.3390/metabo10120486.
6
Comparison of Imputation Methods for Categorical Real-World Prostate Cancer Data with Natural Order.自然顺序下分类真实世界前列腺癌数据的插补方法比较。
Stud Health Technol Inform. 2024 Aug 22;316:1800-1804. doi: 10.3233/SHTI240780.
7
A real data-driven simulation strategy to select an imputation method for mixed-type trait data.一种基于真实数据驱动的选择混合类型性状数据插补方法的模拟策略。
PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.
8
A nonparametric multiple imputation approach for missing categorical data.一种针对缺失分类数据的非参数多重填补方法。
BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.
9
The impact of imputation quality on machine learning classifiers for datasets with missing values.插补质量对具有缺失值数据集的机器学习分类器的影响。
Commun Med (Lond). 2023 Oct 6;3(1):139. doi: 10.1038/s43856-023-00356-z.
10
Generative adversarial networks for imputing missing data for big data clinical research.生成对抗网络在大数据临床研究中用于填补缺失数据。
BMC Med Res Methodol. 2021 Apr 20;21(1):78. doi: 10.1186/s12874-021-01272-3.

本文引用的文献

1
Item response theory as a feature selection and interpretation tool in the context of machine learning.项目反应理论作为机器学习中特征选择和解释的工具。
Med Biol Eng Comput. 2021 Feb;59(2):471-482. doi: 10.1007/s11517-020-02301-x. Epub 2021 Feb 3.
2
Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach.应用项目反应理论进行可解释机器学习预测重症监护病房死亡率:基于案例的方法。
J Med Internet Res. 2020 Sep 25;22(9):e20268. doi: 10.2196/20268.
3
The proportion of missing data should not be used to guide decisions on multiple imputation.
缺失数据的比例不应用于指导多重插补的决策。
J Clin Epidemiol. 2019 Jun;110:63-73. doi: 10.1016/j.jclinepi.2019.02.016. Epub 2019 Mar 13.
4
How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data.处理缺失数据如何影响结论:六种不同的分类问卷数据插补方法的比较
SAGE Open Med. 2019 Jan 8;7:2050312118822912. doi: 10.1177/2050312118822912. eCollection 2019.
5
Building an Evaluation Scale using Item Response Theory.运用项目反应理论构建评估量表。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:648-657. doi: 10.18653/v1/d16-1062.
6
Principled missing data methods for researchers.面向研究人员的有原则的缺失数据处理方法。
Springerplus. 2013 May 14;2(1):222. doi: 10.1186/2193-1801-2-222. Print 2013 Dec.
7
Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data.简单插补法与直接似然分析在多层次教育数据中缺失项目得分的应用比较。
Behav Res Methods. 2012 Jun;44(2):516-31. doi: 10.3758/s13428-011-0157-x.
8
Multiple imputation: a primer.多重填补:入门指南。
Stat Methods Med Res. 1999 Mar;8(1):3-15. doi: 10.1177/096228029900800102.