• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从世界卫生组织死亡率数据库的数据挖掘中吸取的经验教训。

Lessons learned from data mining of WHO mortality database.

作者信息

Paoin W

机构信息

Faculty of Medicine, Thammasat University, Rangsit Campus, Paholyotin Road, Pathumthani 12120, Thailand.

出版信息

Methods Inf Med. 2011;50(4):380-5. doi: 10.3414/ME10-02-0019. Epub 2011 Jun 21.

DOI:10.3414/ME10-02-0019
PMID:21691674
Abstract

OBJECTIVES

The objectives of this research were to test the ability of classification algorithms to predict the cause of death in the mortality data with unknown causes, to find association between common causes of death, to identify groups of countries based on their common causes of death, and to extract knowledge gained from data mining of the World Health Organization mortality database.

METHODS

The WEKA software version 3.5.3 was used for classification, clustering and association analysis of the World Health Organization mortality database which contained 1,109,537 records. Three major steps were performed: Step 1 - preprocessing of data to convert all records into suitable formats for each type of analysis algorithm; Step 2 - analyzing data using the C4.5 decision tree and Naïve Bayes classification algorithm, K-means clustering algorithm and Apriori association analysis algorithm; Step 3 - interpretation of results and hypothesis testing after clustering analysis.

RESULTS

Using a C4.5 decision tree classifier to predict cause of death, we obtained 440 leaf nodes that correctly classify death instances with an accuracy of 40.06%. Naïve Bayes classification algorithm calculated probability of death from each disease that correctly classify death instances with an accuracy of 28.13%. K means clustering divided the data into four clusters with 189, 59, 65, 144 country-years in each cluster. A Chi-square was used to test discriminate disease differences found in each cluster which had different diseases as predominant causes of death. Apriori association analysis produced association rules of linkage among cancer of the lung, hypertension and cerebrovascular diseases. These were found in the top five leading causes of death with 99-100% confidence level.

CONCLUSION

Classification tools produced the poorest results in predicting cause of death. Given the inadequacy of variables in the WHO database, creation of a classification model to predict specific cause of death was impossible. Clustering and association tools yielded interesting results that could be used to identify new areas of interest in mortality data analysis. This can be used in data mining analysis to help solve some quality problems in mortality data.

摘要

目标

本研究的目标是测试分类算法在死因不明的死亡率数据中预测死因的能力,找出常见死因之间的关联,根据共同死因识别国家群体,并从世界卫生组织死亡率数据库的数据挖掘中提取知识。

方法

使用WEKA 3.5.3软件对包含1,109,537条记录的世界卫生组织死亡率数据库进行分类、聚类和关联分析。执行了三个主要步骤:步骤1 - 数据预处理,将所有记录转换为适合每种分析算法的格式;步骤2 - 使用C4.5决策树和朴素贝叶斯分类算法、K均值聚类算法和Apriori关联分析算法分析数据;步骤3 - 聚类分析后的结果解释和假设检验。

结果

使用C4.5决策树分类器预测死因,我们获得了440个叶节点,这些节点正确分类死亡实例的准确率为40.06%。朴素贝叶斯分类算法计算了每种疾病导致死亡的概率,正确分类死亡实例的准确率为28.13%。K均值聚类将数据分为四个聚类,每个聚类分别有189、59、65、144个国家年。使用卡方检验来检验每个聚类中发现的不同疾病差异,每个聚类中不同疾病是主要死因。Apriori关联分析产生了肺癌、高血压和脑血管疾病之间的关联规则。这些在五大主要死因中被发现,置信水平为99 - 100%。

结论

分类工具在预测死因方面产生的结果最差。鉴于世界卫生组织数据库中变量的不足,创建一个预测特定死因的分类模型是不可能的。聚类和关联工具产生了有趣的结果,可用于识别死亡率数据分析中的新感兴趣领域。这可用于数据挖掘分析,以帮助解决死亡率数据中的一些质量问题。

相似文献

1
Lessons learned from data mining of WHO mortality database.从世界卫生组织死亡率数据库的数据挖掘中吸取的经验教训。
Methods Inf Med. 2011;50(4):380-5. doi: 10.3414/ME10-02-0019. Epub 2011 Jun 21.
2
[Research on medical data mining and its applications].
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2014 Oct;31(5):1182-6.
3
A novel method for predicting kidney stone type using ensemble learning.一种使用集成学习预测肾结石类型的新方法。
Artif Intell Med. 2018 Jan;84:117-126. doi: 10.1016/j.artmed.2017.12.001. Epub 2017 Dec 11.
4
An Efficient Optimization Method for Solving Unsupervised Data Classification Problems.一种解决无监督数据分类问题的高效优化方法。
Comput Math Methods Med. 2015;2015:802754. doi: 10.1155/2015/802754. Epub 2015 Jul 29.
5
Discovering metric temporal constraint networks on temporal databases.发现时态数据库上的度量时态约束网络。
Artif Intell Med. 2013 Jul;58(3):139-54. doi: 10.1016/j.artmed.2013.03.006. Epub 2013 May 6.
6
A novel decision-tree method for structured continuous-label classification.一种新颖的决策树方法,用于结构化连续标签分类。
IEEE Trans Cybern. 2013 Dec;43(6):1734-46. doi: 10.1109/TSMCB.2012.2229269.
7
Influence of data mining technology in information analysis of human resource management on macroscopic economic management.数据挖掘技术对人力资源管理信息分析在宏观经济管理中的影响。
PLoS One. 2021 May 18;16(5):e0251483. doi: 10.1371/journal.pone.0251483. eCollection 2021.
8
A Data Mining Algorithm for Association Rules with Chronic Disease Constraints.一种具有慢性病约束的关联规则数据挖掘算法。
Comput Intell Neurosci. 2022 Aug 23;2022:8526256. doi: 10.1155/2022/8526256. eCollection 2022.
9
Applying data mining for the analysis of breast cancer data.应用数据挖掘技术分析乳腺癌数据。
Methods Mol Biol. 2015;1246:175-89. doi: 10.1007/978-1-4939-1985-7_12.
10
Restricted Versus Unrestricted Search Space: Experience from Mining a Large Japanese Database.受限搜索空间与非受限搜索空间:从挖掘大型日本数据库中获得的经验
Stud Health Technol Inform. 2015;216:1072.

引用本文的文献

1
Mortality Prediction from Hospital-Acquired Infections in Trauma Patients Using an Unbalanced Dataset.使用不均衡数据集预测创伤患者医院获得性感染的死亡率
Healthc Inform Res. 2020 Oct;26(4):284-294. doi: 10.4258/hir.2020.26.4.284. Epub 2020 Oct 31.
2
Utilizing Electronic Medical Records to Discover Changing Trends of Medical Behaviors Over Time.利用电子病历发现医疗行为随时间的变化趋势。
Methods Inf Med. 2017 May 5;56(S 01):e49-e66. doi: 10.3414/ME16-01-0047.