Suppr超能文献

使用字符串语法非欧几里得关系模糊 C 均值进行传染病关系数据分析。

Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means.

机构信息

Department of Computer Engineering, Faculty of Engineering, Graduate School, Chiang Mai University, Chiang Mai 50200, Thailand.

Department of Computer Engineering, Faculty of Engineering, Excellence Center in Infrastructure Technology and Transportation Engineering, Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand.

出版信息

Int J Environ Res Public Health. 2021 Aug 1;18(15):8153. doi: 10.3390/ijerph18158153.

Abstract

Statistical analysis in infectious diseases is becoming more important, especially in prevention policy development. To achieve that, the epidemiology, a study of the relationship between the occurrence and who/when/where, is needed. In this paper, we develop the string grammar non-Euclidean relational fuzzy C-means (sgNERF-CM) algorithm to determine a relationship inside the data from the age, career, and month viewpoint for all provinces in Thailand for the dengue fever, influenza, and Hepatitis B virus (HBV) infection. The Dunn's index is used to select the best models because of its ability to identify the compact and well-separated clusters. We compare the results of the sgNERF-CM algorithm with the string grammar relational hard C-means (sgRHCM) algorithm. In addition, their numerical counterparts, i.e., relational hard C-means (RHCM) and non-Euclidean relational fuzzy C-means (NERF-CM) algorithms are also applied in the comparison. We found that the sgNERF-CM algorithm is far better than the numerical counterparts and better than the sgRHCM algorithm in most cases. From the results, we found that the month-based dataset does not help in relationship-finding since the diseases tend to happen all year round. People from different age ranges in different regions in Thailand have different numbers of dengue fever infections. The occupations that have a higher chance to have dengue fever are student and teacher groups from the central, north-east, north, and south regions. Additionally, students in all regions, except the central region, have a high risk of dengue infection. For the influenza dataset, we found that a group of people with the age of more than 1 year to 64 years old has higher number of influenza infections in every province. Most occupations in all regions have a higher risk of infecting the influenza. For the HBV dataset, people in all regions with an age between 10 to 65 years old have a high risk in infecting the disease. In addition, only farmer and general contractor groups in all regions have high chance of infecting HBV as well.

摘要

传染病的统计分析变得越来越重要,特别是在制定预防政策方面。为此,需要进行流行病学研究,即研究疾病的发生及其与时间、地点和人群的关系。在本文中,我们开发了基于字符串语法的非欧式关系模糊 C 均值(sgNERF-CM)算法,以从年龄、职业和月份的角度确定泰国所有省份登革热、流感和乙型肝炎病毒(HBV)感染的数据内部关系。由于邓恩指数能够识别紧密且分离良好的聚类,因此我们使用邓恩指数来选择最佳模型。我们将 sgNERF-CM 算法的结果与字符串语法关系硬 C 均值(sgRHCM)算法进行了比较。此外,还应用了它们的数值对应物,即关系硬 C 均值(RHCM)和非欧式关系模糊 C 均值(NERF-CM)算法进行比较。我们发现,在大多数情况下,sgNERF-CM 算法远远优于数值对应物和 sgRHCM 算法。从结果来看,我们发现基于月份的数据集无助于发现关系,因为这些疾病往往全年都会发生。泰国不同地区不同年龄段的人感染登革热的人数不同。感染登革热风险较高的职业是来自中部、东北部、北部和南部地区的学生和教师群体。此外,除了中部地区,所有地区的学生都有较高的登革热感染风险。对于流感数据集,我们发现,每个省份年龄在 1 岁至 64 岁之间的一组人感染流感的人数较多。所有地区的大多数职业感染流感的风险较高。对于 HBV 数据集,所有地区年龄在 10 至 65 岁之间的人感染该疾病的风险较高。此外,只有所有地区的农民和总承包商群体也有较高的感染 HBV 的机会。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7dc/8346127/201bd3e80232/ijerph-18-08153-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验