Sumathi A, Meganathan S
Department of Computer Science Engineering, SASTRA Deemed To be University, Srinivasa Ramanujan Centre (SRC), Kumbakonam, Tamil Nadu, India.
Bioinformation. 2019 Dec 31;15(12):875-882. doi: 10.6026/97320630015875. eCollection 2019.
Diabetic Mellitus is the leading disease in the world irrespective of age and geographical location. It is estimated that 43% of the overall population is affected by the disease. The reasons for the disease include inappropriate diet lifestyle with allied symptoms like obesity. Therefore, the prognosis and diagnosis of the disease are important for adequate combat and care. The prognosis related known symptoms of the disease include incontinence (inability to control urination) and frequent fatigue. Moreover, early prediction of the disease plays an important role in the prognosis of other associated conditions such as heart failure leading to chronic illness. Hence, it is of interest to describe a data mining based prediction model using known features (derived from epidemiological data collected from the public hospital using routine tests) to help in the prognosis of the disease. We used data pre-processing techniques for handling missing values and dimensionality reduction models to improve data quality. The Minimum Description Length principle (MDL) model for discretization (replacing a continuum with a finite set of points) is used to reduce high-level dimensionality of the dataset, which enabled to categorize the dataset into small groups in ordered intervals. Thus, we describe a semi-supervised learning technique (identifies promising attributes using clustering and classification methods) by combining data mining techniques for reasonable accuracy having adequate sensitivity and specificity for further discussion, cross-validation, revaluation, and application. Early prediction of the disease with improved accuracy by analysing specificity ranges in blood pressure and glucose levels will be useful to combat Diabetes Mellitus.
无论年龄和地理位置如何,糖尿病都是全球主要疾病。据估计,总体人口中有43%受该疾病影响。该疾病的病因包括不当的饮食生活方式以及肥胖等相关症状。因此,该疾病的预后和诊断对于有效对抗和护理至关重要。与该疾病预后相关的已知症状包括尿失禁(无法控制排尿)和频繁疲劳。此外,该疾病的早期预测在其他相关病症(如导致慢性病的心力衰竭)的预后中起着重要作用。因此,描述一种基于数据挖掘的预测模型很有意义,该模型使用已知特征(从公立医院通过常规检查收集的流行病学数据中得出)来辅助该疾病的预后。我们使用数据预处理技术来处理缺失值,并使用降维模型来提高数据质量。用于离散化(用有限的点集替换连续体)的最小描述长度原则(MDL)模型用于降低数据集的高维性,这使得能够将数据集按有序区间分类为小组。因此,我们通过结合数据挖掘技术描述了一种半监督学习技术(使用聚类和分类方法识别有前景的属性),以获得具有足够敏感性和特异性的合理准确性,用于进一步讨论、交叉验证、重新评估和应用。通过分析血压和血糖水平的特异性范围提高疾病早期预测的准确性,将有助于对抗糖尿病。