Department of Computer Science, Universidad de Alcalá, Politechnic School, Alcala de Henares, 28805, Spain.
Department of Nursing and Physiotherapy, Universidad de Alcalá, Faculty of Medicine and Health Sciences, Alcala de Henares, 28805, Spain.
Comput Biol Med. 2024 May;174:108469. doi: 10.1016/j.compbiomed.2024.108469. Epub 2024 Apr 14.
This research addresses the problem of detecting acute respiratory, urinary tract, and other infectious diseases in elderly nursing home residents using machine learning algorithms. The study analyzes data extracted from multiple vital signs and other contextual information for diagnostic purposes. The daily data collection process encounters sampling constraints due to weekends, holidays, shift changes, staff turnover, and equipment breakdowns, resulting in numerous nulls, repeated readings, outliers, and meaningless values. The short time series generated also pose a challenge to analysis, preventing the extraction of seasonal information or consistent trends. Blind data collection results in most of the data coming from periods when residents are healthy, resulting in excessively imbalanced data. This study proposes a data cleaning process and then builds a mechanism that reproduces the basal activity of the residents to improve the classification of the disease. The results show that the proposed basal module-assisted machine learning techniques allow anticipating diagnostics 2, 3 or 4 days before doctors decide to start treatment with antibiotics, achieving a performance measured by the area-under-the-curve metric of 0.857. The contributions of this work are: (1) a new data cleaning process; (2) the analysis of contextual information to improve data quality; (3) the generation of a baseline measure for relative comparison; and (4) the use of either binary (disease/no disease) or multiclass classification, differentiating among types of infections and showing the advantages of multiclass versus binary classification. From a medical point of view, the anticipated detection of infectious diseases in institutionalized individuals is brand new.
本研究旨在利用机器学习算法检测养老院老年居民的急性呼吸道、尿路感染和其他传染病。该研究分析了为诊断目的而从多个生命体征和其他上下文信息中提取的数据。由于周末、节假日、班次变更、员工更替和设备故障,日常数据采集过程中会遇到采样限制,导致出现大量空值、重复读数、异常值和无意义的值。生成的短时间序列也对分析构成挑战,无法提取季节性信息或一致的趋势。盲目数据采集导致大部分数据来自居民健康时期,从而导致数据严重失衡。本研究提出了一种数据清理流程,然后构建了一种机制,复制居民的基本活动,以改善疾病分类。结果表明,所提出的基于基本模块的机器学习技术能够在医生决定开始使用抗生素治疗前 2、3 或 4 天提前进行诊断,通过曲线下面积度量指标,实现了 0.857 的性能。本工作的贡献在于:(1)一种新的数据清理流程;(2)对上下文信息的分析,以提高数据质量;(3)生成基线度量值,用于相对比较;(4)使用二进制(疾病/无疾病)或多类分类,区分不同类型的感染,并展示多类分类相对于二进制分类的优势。从医学角度来看,机构化个体中传染病的预期检测是全新的。