Suppr超能文献

基于深度学习的决策树集成算法在不完全医疗数据集上的应用。

Deep learning based decision tree ensembles for incomplete medical datasets.

机构信息

Division of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan.

Department of Information Management, National Central University, Taoyuan, Taiwan.

出版信息

Technol Health Care. 2024;32(1):75-87. doi: 10.3233/THC-220514.

Abstract

BACKGROUND

In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose.

OBJECTIVE

To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles.

METHOD

Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison.

RESULTS

The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method.

CONCLUSION

The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.

摘要

背景

在实际应用中,用于数据分析的数据集通常是不完整的,因为有些数据包含缺失的属性值。许多相关的工作都集中在构建特定的模型来进行估计以替换缺失值,从而使原始不完整的数据集变得完整。另一种解决方案是直接处理不完整的数据集,而不进行缺失值插补,决策树是为此目的的主要技术。

目的

引入一种新的方法,即基于深度学习的决策树集成(DLDTE),它借鉴了深度学习技术中使用的边界框和滑动窗口策略,将不完整的数据集划分为多个子集,并通过决策树从每个子集进行学习,从而形成决策树集成。

方法

两个包含数百个特征维度的医学领域问题数据集,缺失率为 10%至 50%,用于性能比较。

结果

与基线决策树方法以及两种缺失值插补方法(均值和 k-最近邻)和案例删除方法相比,所提出的 DLDTE 提供了最高的分类准确率。

结论

结果表明 DLDTE 对于处理具有不同缺失率的不完整医学数据集是有效的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验