Das Sayan, Sil Jaya
Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, West Bengal India.
Health Inf Sci Syst. 2019 Feb 18;7(1):5. doi: 10.1007/s13755-019-0066-4. eCollection 2019 Dec.
In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height-weight, BMI, Oxygen saturation level (SpO) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set.
The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients' records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk-Leiber (K-L) distance and distance correlation () techniques, showing that the simulated data sets are well correlated with the real data set.
Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts.
It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts.
在印度,67%的总人口生活在偏远地区,由于医生短缺,在这些地区提供初级医疗保健是一项真正的挑战。健康亭被部署在偏远村庄,并收集诸如血压、脉搏率、身高体重、体重指数、血氧饱和度水平(SpO)等基本健康数据。由于测量误差,所获取的数据往往不准确,并且包含缺失值。本文提出了一个综合框架,通过管理数据集中存在的不确定性来估算缺失的症状值。
对数据集进行模糊化处理以管理不确定性,并应用模糊c均值聚类算法将症状特征向量分组到不同的疾病类别中。使用基于多重模糊的回归模型估算与每种疾病相对应的缺失症状值。不同症状之间的关系借助专家和医学文献来构建。由于血压症状的特性且与其他症状不同,因此采用了一种新颖的方法来处理。从健康亭获得的患者记录并不充足,因此通过蒙特卡罗方法模拟相关数据,以避免在估算症状缺失值时出现过拟合问题。使用库尔贝克-莱布勒(K-L)距离和距离相关性()技术对生成的数据集进行验证,结果表明模拟数据集与真实数据集具有良好的相关性。
使用这些数据集构建了所提出的模型,并使用Softmax成本函数对新患者进行初步诊断。通过达到约98%的准确率确定了多种作为疾病的类别标签,并与专家提供的真实情况进行了验证。
值得一提的是,该系统用于初级医疗保健,在紧急情况下,患者会被转诊给专家。