• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自然顺序下分类真实世界前列腺癌数据的插补方法比较。

Comparison of Imputation Methods for Categorical Real-World Prostate Cancer Data with Natural Order.

机构信息

Johannes Gutenberg University, Mainz, Germany.

Cancer Registry of Rhineland-Palatinate in the Institute for Digital Health Data, Germany.

出版信息

Stud Health Technol Inform. 2024 Aug 22;316:1800-1804. doi: 10.3233/SHTI240780.

DOI:10.3233/SHTI240780
PMID:39176840
Abstract

Missing values (NA) often occur in cancer research, which may be due to reasons such as data protection, data loss, or missing follow-up data. Such incomplete patient information can have an impact on prediction models and other data analyses. Imputation methods are a tool for dealing with NA. Cancer data is often presented in an ordered categorical form, such as tumour grading and staging, which requires special methods. This work compares mode imputation, k nearest neighbour (knn) imputation, and, in the context of Multiple Imputation by Chained Equations (MICE), logistic regression model with proportional odds (mice_polr) and random forest (mice_rf) on a real-world prostate cancer dataset provided by the Cancer Registry of Rhineland-Palatinate in Germany. Our dataset contains relevant information for the risk classification of patients and the time between date of diagnosis and date of death. For the imputation comparison, we use Rubin's (1974) Missing Completely At Random (MCAR) mechanism to remove 10%, 20%, 30%, and 50% observations. The results are evaluated and ranked based on the accuracy per patient. Mice_rf performs significantly best for each percentage of NA, followed by knn, and mice_polr performs significantly worst. Furthermore, our findings indicate that the accuracy of imputation methods increases with a lower number of categories, a relatively even proportion of patients in the categories, or a majority of patients in a particular category.

摘要

在癌症研究中,经常会出现缺失值(NA),这可能是由于数据保护、数据丢失或缺失随访数据等原因。这种不完整的患者信息可能会对预测模型和其他数据分析产生影响。插补方法是处理 NA 的一种工具。癌症数据通常以有序分类的形式呈现,例如肿瘤分级和分期,这需要特殊的方法。本工作比较了模式插补、k 最近邻(knn)插补和在多链式方程(MICE)的背景下,逻辑回归模型与比例优势(mice_polr)和随机森林(mice_rf)在德国莱茵兰-普法尔茨癌症登记处提供的真实前列腺癌数据集上的应用。我们的数据集包含了患者风险分类和诊断日期与死亡日期之间时间的相关信息。对于插补比较,我们使用鲁宾(1974)的完全随机缺失(MCAR)机制来删除 10%、20%、30%和 50%的观测值。结果根据每个患者的准确性进行评估和排名。mice_rf 在每个缺失百分比下的表现都明显优于其他方法,其次是 knn,而 mice_polr 的表现明显最差。此外,我们的研究结果表明,插补方法的准确性随着类别数量的减少、类别中患者比例的相对均匀性或特定类别中大多数患者的增加而提高。

相似文献

1
Comparison of Imputation Methods for Categorical Real-World Prostate Cancer Data with Natural Order.自然顺序下分类真实世界前列腺癌数据的插补方法比较。
Stud Health Technol Inform. 2024 Aug 22;316:1800-1804. doi: 10.3233/SHTI240780.
2
Imputation of missing values of tumour stage in population-based cancer registration.基于人群的癌症登记中肿瘤分期缺失值的推断。
BMC Med Res Methodol. 2011 Sep 19;11:129. doi: 10.1186/1471-2288-11-129.
3
A real data-driven simulation strategy to select an imputation method for mixed-type trait data.一种基于真实数据驱动的选择混合类型性状数据插补方法的模拟策略。
PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.
4
A nonparametric multiple imputation approach for missing categorical data.一种针对缺失分类数据的非参数多重填补方法。
BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.
5
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.缺失数据插补方法对队列研究数据集预测建模效果的比较。
BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x.
6
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
7
Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values.对具有未知离散值的乳腺癌患者5年生存预测中的缺失数据进行插补。
Comput Biol Med. 2015 Apr;59:125-133. doi: 10.1016/j.compbiomed.2015.02.006. Epub 2015 Feb 16.
8
IRTCI: Item Response Theory for Categorical Imputation.IRTCI:用于分类插补的项目反应理论
Res Sq. 2024 Jul 2:rs.3.rs-4529519. doi: 10.21203/rs.3.rs-4529519/v1.
9
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
10
Missing data imputation using classification and regression trees.使用分类与回归树进行缺失数据插补
PeerJ Comput Sci. 2024 Jun 28;10:e2119. doi: 10.7717/peerj-cs.2119. eCollection 2024.

引用本文的文献

1
Machine learning models to predict osteoporosis in patients with chronic kidney disease stage 3-5 and end-stage kidney disease.用于预测慢性肾脏病3-5期及终末期肾病患者骨质疏松症的机器学习模型。
Sci Rep. 2025 Apr 3;15(1):11391. doi: 10.1038/s41598-025-95928-5.