Suppr超能文献

基于2型糖尿病患者数据的特征选择与分类模型构建

Feature selection and classification model construction on type 2 diabetic patients' data.

作者信息

Huang Yue, McCullagh Paul, Black Norman, Harper Roy

机构信息

Department of Computing, Faculty of Engineering, Imperial College London, South Kensington, London SW7 2AZ, UK.

出版信息

Artif Intell Med. 2007 Nov;41(3):251-62. doi: 10.1016/j.artmed.2007.07.002. Epub 2007 Aug 17.

Abstract

OBJECTIVE

Diabetes affects between 2% and 4% of the global population (up to 10% in the over 65 age group), and its avoidance and effective treatment are undoubtedly crucial public health and health economics issues in the 21st century. The aim of this research was to identify significant factors influencing diabetes control, by applying feature selection to a working patient management system to assist with ranking, classification and knowledge discovery. The classification models can be used to determine individuals in the population with poor diabetes control status based on physiological and examination factors.

METHODS

The diabetic patients' information was collected by Ulster Community and Hospitals Trust (UCHT) from year 2000 to 2004 as part of clinical management. In order to discover key predictors and latent knowledge, data mining techniques were applied. To improve computational efficiency, a feature selection technique, feature selection via supervised model construction (FSSMC), an optimisation of ReliefF, was used to rank the important attributes affecting diabetic control. After selecting suitable features, three complementary classification techniques (Naïve Bayes, IB1 and C4.5) were applied to the data to predict how well the patients' condition was controlled.

RESULTS

FSSMC identified patients' 'age', 'diagnosis duration', the need for 'insulin treatment', 'random blood glucose' measurement and 'diet treatment' as the most important factors influencing blood glucose control. Using the reduced features, a best predictive accuracy of 95% and sensitivity of 98% was achieved. The influence of factors, such as 'type of care' delivered, the use of 'home monitoring', and the importance of 'smoking' on outcome can contribute to domain knowledge in diabetes control.

CONCLUSION

In the care of patients with diabetes, the more important factors identified: patients' 'age', 'diagnosis duration' and 'family history', are beyond the control of physicians. Treatment methods such as 'insulin', 'diet' and 'tablets' (a variety of oral medicines) may be controlled. However lifestyle indicators such as 'body mass index' and 'smoking status' are also important and may be controlled by the patient. This further underlines the need for public health education to aid awareness and prevention. More subtle data interactions need to be better understood and data mining can contribute to the clinical evidence base. The research confirms and to a lesser extent challenges current thinking. Whilst fully appreciating the requirement for clinical verification and interpretation, this work supports the use of data mining as an exploratory tool, particularly as the domain is suffering from a data explosion due to enhanced monitoring and the (potential) storage of this data in the electronic health record. FSSMC has proved a useful feature estimator for large data sets, where processing efficiency is an important factor.

摘要

目的

糖尿病影响着全球2%至4%的人口(65岁以上年龄组中这一比例高达10%),避免患糖尿病并进行有效治疗无疑是21世纪至关重要的公共卫生和健康经济学问题。本研究的目的是通过对一个实用的患者管理系统应用特征选择,以辅助进行排名、分类和知识发现,从而确定影响糖尿病控制的重要因素。分类模型可用于根据生理和检查因素确定人群中糖尿病控制状况不佳的个体。

方法

阿尔斯特社区与医院信托基金(UCHT)在2000年至2004年期间收集了糖尿病患者的信息,作为临床管理的一部分。为了发现关键预测因素和潜在知识,应用了数据挖掘技术。为提高计算效率,使用了一种特征选择技术,即通过监督模型构建进行特征选择(FSSMC),这是对ReliefF的一种优化,用于对影响糖尿病控制的重要属性进行排名。在选择合适的特征后,将三种互补的分类技术(朴素贝叶斯、IB1和C4.5)应用于数据,以预测患者病情的控制情况。

结果

FSSMC确定患者的“年龄”“诊断时长”“胰岛素治疗需求”“随机血糖”测量和“饮食治疗”是影响血糖控制的最重要因素。使用简化后的特征,实现了95%的最佳预测准确率和98%的灵敏度。诸如提供的“护理类型”、“家庭监测”的使用以及“吸烟”对结果的重要性等因素的影响有助于糖尿病控制领域的知识积累。

结论

在糖尿病患者的护理中,确定的更重要因素:患者的“年龄”“诊断时长”和“家族病史”,医生无法控制。诸如“胰岛素”“饮食”和“片剂”(各种口服药物)等治疗方法可能是可控的。然而,诸如“体重指数”和“吸烟状况”等生活方式指标也很重要,且可能由患者控制。这进一步强调了开展公共卫生教育以提高认识和预防的必要性。需要更好地理解更细微的数据相互作用,数据挖掘可为临床证据库做出贡献。该研究证实并在一定程度上挑战了当前的观点。在充分认识到临床验证和解释的必要性的同时,这项工作支持将数据挖掘用作一种探索性工具,特别是因为由于监测的加强以及这些数据(可能)存储在电子健康记录中,该领域正面临数据爆炸的问题。FSSMC已被证明是一种适用于大型数据集的有用特征估计器,在大型数据集中处理效率是一个重要因素。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验