Suppr超能文献

描述性森林:一种新颖的树结构泛化方法在描述心血管疾病中的实验。

Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases.

机构信息

Department of Computer and Information Science, Faculty of Science and Engineering, Kasetsart University, Chalermphrakiat Sakonnakhon Province Campus, Sakonnakhon, 47000, Thailand.

出版信息

BMC Med Inform Decis Mak. 2023 Jul 28;23(1):141. doi: 10.1186/s12911-023-02228-x.

Abstract

BACKGROUND

A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic.

METHODS

We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient's primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure-prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health-indicators dataset.

RESULTS

The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated.

CONCLUSION

The results provide confidence for using the descriptive forest.

摘要

背景

决策树是描述与心血管疾病(CVD)风险相关的因素以及为患者预测和解释该风险的重要工具。值得注意的是,必须简化决策树,因为患者可能具有与 CVD 风险相关的不同主要主题或因素。许多决策树可以描述从多个环境心脏病风险数据集或森林中收集的数据,其中每棵树都描述了每个主要主题的 CVD 风险。

方法

我们使用从多个数据集获得的综合 CVD 数据集来证明树或森林的存在。此外,我们应用一种新方法来对关联规则树进行分析,以发现数据集中隐藏的每个主要主题。为了将树结构推广到描述性任务,每个主要主题都是边界节点,充当 C4.5 树的根节点,该树结构的树结构最少(PTS)。所有树都被分配给描述性森林,以描述数据集的 CVD 风险。描述性森林用于描述每个 CVD 患者的主要风险主题和相关因素。我们从一个心力衰竭预测数据集的 918 条记录中获取描述性森林,该数据集包含从五个数据集获得的 11 个特征,共有 8 个主要主题。我们将该方法应用于 253,680 条记录,这些记录来自心脏病健康指标数据集的不平衡类,该数据集具有 22 个特征。

结果

通过与从同一数据集生成的但 PTS 最小的 C4.5 树进行比较研究(对 CVD 风险解释的定性和定量任务),证明了描述性森林的可用性。定性描述性任务证实,与单个 C4.5 树相比,描述性森林更灵活,可以更好地描述 CVD 风险,而定量描述性任务则证实它实现了更高的覆盖率(召回率)和正确性(准确性和精度),并提供了更详细的解释。此外,对于这些任务,描述性森林仍然优于 C4.5 树。为了减少不平衡类的问题,研究了生成每棵树的每个子数据集的类比例。

结论

这些结果为使用描述性森林提供了信心。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验