Department of Computer Science and Information Technology, Virtual University of Pakistan, Lahore, Pakistan.
Department of Computer Engineering, Bandirma Onyedi Eylul University, Balıkesir, Turkey.
BMC Med Inform Decis Mak. 2023 Jan 18;23(1):11. doi: 10.1186/s12911-022-02092-1.
Water quality has been compromised and endangered by different contaminants due to Pakistan's rapid population development, which has resulted in a dramatic rise in waterborne infections and afflicted many regions of Pakistan. Because of this, modeling and predicting waterborne diseases has become a hot topic for researchers and is very important for controlling waterborne disease pollution.
In our study, first, we collected typhoid and malaria patient data for the years 2017-2020 from Ayub Medical Hospital. The collected data set has seven important input features. In the current study, different ML models were first trained and tested on the current study dataset using the tenfold cross-validation method. Second, we investigated the importance of input features in waterborne disease-positive case detection. The experiment results showed that Random Forest correctly predicted malaria-positive cases 60% of the time and typhoid-positive cases 77% of the time, which is better than other machine-learning models. In this research, we have also investigated the input features that are more important in the prediction and will help analyze positive cases of waterborne disease. The random forest feature selection technique has been used, and experimental results have shown that age, history, and test results play an important role in predicting waterborne disease-positive cases. In the end, we concluded that this interesting study could help health departments in different areas reduce the number of people who get sick from the water.
由于巴基斯坦人口的快速发展,水质受到了不同污染物的损害和威胁,这导致了水源性感染的急剧上升,并使巴基斯坦的许多地区受到影响。因此,对水源性疾病进行建模和预测已成为研究人员的热门话题,对于控制水源性疾病污染非常重要。
在我们的研究中,首先从阿尤布医学医院收集了 2017 年至 2020 年的伤寒和疟疾患者数据。所收集的数据集具有七个重要的输入特征。在本研究中,首先使用十折交叉验证方法在当前研究数据集上训练和测试不同的 ML 模型。其次,我们调查了输入特征在水源性疾病阳性病例检测中的重要性。实验结果表明,随机森林正确预测了 60%的疟疾阳性病例和 77%的伤寒阳性病例,优于其他机器学习模型。在这项研究中,我们还研究了在预测中更重要的输入特征,这将有助于分析水源性疾病的阳性病例。我们使用了随机森林特征选择技术,实验结果表明,年龄、病史和检测结果在预测水源性疾病阳性病例中起着重要作用。最后,我们得出结论,这项有趣的研究可以帮助不同地区的卫生部门减少因水而患病的人数。