• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于处理医学缺失值的新型加权距离阈值方法。

A novel weighted distance threshold method for handling medical missing values.

作者信息

Cheng Ching-Hsue, Chang Jing-Rong, Huang Hao-Hsuan

机构信息

Department of Information Management, National Yunlin University of Science & Technology, 123, section 3, University Road, Touliu, Yunlin 640, Taiwan.

Department of Information Management, Chaoyang University of Technology, Taichung, Taiwan.

出版信息

Comput Biol Med. 2020 Jul;122:103824. doi: 10.1016/j.compbiomed.2020.103824. Epub 2020 May 30.

DOI:10.1016/j.compbiomed.2020.103824
PMID:32658729
Abstract

Data in the medical field often contain missing values and may result in biased research results. Therefore, the objective of this work is to propose a new imputation method, a novel weighted distance threshold method, to impute missing values. After several experiments, we find that the proposed imputation method has the following benefits. (1) The proposed method with purity can reassign instances into the nearest class of the dataset, and the purity computation can filter outliers; (2) The proposed method redefines the degree of missing values and can determine attributes and instances relative to the missing values in different datasets; and (3) The proposed method need not set the k value of the nearest neighborhood because this study identifies the k value based on the best threshold to calculate purity to enhance the results of imputation. In addition, the distance threshold can adjust the optimal nearest neighborhood to estimate missing values. This study implements several experiments to compare the proposed method with other imputation methods using different missing types, missing degrees, and types of datasets. The results indicate that the proposed imputation method is better than the listed methods. Moreover, this study uses the stroke dataset from the International Stroke Trial (IST) to verify whether the proposed method can be effectively applied in practice, and the results show that the proposed method achieves 90% accuracy in the Stroke dataset.

摘要

医学领域的数据常常包含缺失值,这可能会导致有偏差的研究结果。因此,这项工作的目标是提出一种新的插补方法,即一种新颖的加权距离阈值方法,用于插补缺失值。经过多次实验,我们发现所提出的插补方法具有以下优点。(1)所提出的具有纯度的方法可以将实例重新分配到数据集中最近的类别,并且纯度计算可以过滤异常值;(2)所提出的方法重新定义了缺失值的程度,并且可以确定不同数据集中相对于缺失值的属性和实例;(3)所提出的方法无需设置最近邻域的k值,因为本研究基于计算纯度的最佳阈值来确定k值,以提高插补结果。此外,距离阈值可以调整最优最近邻域来估计缺失值。本研究进行了多次实验,将所提出的方法与其他插补方法在不同的缺失类型、缺失程度和数据集类型上进行比较。结果表明,所提出的插补方法优于所列方法。此外,本研究使用国际中风试验(IST)的中风数据集来验证所提出的方法是否可以在实际中有效应用,结果表明所提出的方法在中风数据集中达到了90%的准确率。

相似文献

1
A novel weighted distance threshold method for handling medical missing values.一种用于处理医学缺失值的新型加权距离阈值方法。
Comput Biol Med. 2020 Jul;122:103824. doi: 10.1016/j.compbiomed.2020.103824. Epub 2020 May 30.
2
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
3
On mining incomplete medical datasets: Ordering imputation and classification.关于挖掘不完整医学数据集:排序插补与分类。
Technol Health Care. 2015;23(5):619-25. doi: 10.3233/THC-151018.
4
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
5
Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach.利用互信息,采用自适应 k-最近邻方法对静态和动态混合类型临床数据进行插补。
BMC Med Inform Decis Mak. 2020 Aug 20;20(Suppl 5):174. doi: 10.1186/s12911-020-01166-2.
6
Missing value imputation for gene expression data by tailored nearest neighbors.通过定制最近邻算法对基因表达数据进行缺失值插补
Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):95-106. doi: 10.1515/sagmb-2015-0098.
7
Imputation methods for high-dimensional mixed-type datasets by nearest neighbors.基于最近邻的高维混合数据集插补方法。
Comput Biol Med. 2021 Aug;135:104577. doi: 10.1016/j.compbiomed.2021.104577. Epub 2021 Jun 17.
8
Two-pass imputation algorithm for missing value estimation in gene expression time series.用于基因表达时间序列中缺失值估计的双程插补算法。
J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.
9
R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data.R-Ensembler:一种基于粗糙集的贪婪集成属性选择算法,具有 kNN 插补功能,用于医学数据的分类。
Comput Methods Programs Biomed. 2020 Feb;184:105122. doi: 10.1016/j.cmpb.2019.105122. Epub 2019 Oct 8.
10
rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data.rMisbeta:转录组学和代谢组学数据中稳健的缺失值插补方法。
Comput Biol Med. 2021 Nov;138:104911. doi: 10.1016/j.compbiomed.2021.104911. Epub 2021 Sep 29.

引用本文的文献

1
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.概念框架作为选择临床结构化数据集中缺失值的适当插补方法的指南。
BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.
2
A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications.一种基于 MissForest 的新的缺失值插补方法,在医学应用中采用递归特征消除。
BMC Med Res Methodol. 2024 Nov 8;24(1):269. doi: 10.1186/s12874-024-02392-2.
3
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.
识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
4
Missing Value Imputation Method for Multiclass Matrix Data Based on Closed Itemset.基于封闭项集的多类矩阵数据缺失值插补方法
Entropy (Basel). 2022 Feb 16;24(2):286. doi: 10.3390/e24020286.