JRC-COMBINE, RWTH Aachen University, MTZ, Pauwelsstrasse 19, Level 3, 52074, Aachen, Germany.
Pharmacometrics / Modeling and Simulation, Bayer AG - Pharmaceuticals, Leverkusen, Germany.
Sci Rep. 2023 Mar 11;13(1):4053. doi: 10.1038/s41598-023-30986-1.
Electronic health records (EHRs) are used in hospitals to store diagnoses, clinician notes, examinations, lab results, and interventions for each patient. Grouping patients into distinct subsets, for example, via clustering, may enable the discovery of unknown disease patterns or comorbidities, which could eventually lead to better treatment through personalized medicine. Patient data derived from EHRs is heterogeneous and temporally irregular. Therefore, traditional machine learning methods like PCA are ill-suited for analysis of EHR-derived patient data. We propose to address these issues with a new methodology based on training a gated recurrent unit (GRU) autoencoder directly on health record data. Our method learns a low-dimensional feature space by training on patient data time series, where the time of each data point is expressed explicitly. We use positional encodings for time, allowing our model to better handle the temporal irregularity of the data. We apply our method to data from the Medical Information Mart for Intensive Care (MIMIC-III). Using our data-derived feature space, we can cluster patients into groups representing major classes of disease patterns. Additionally, we show that our feature space exhibits a rich substructure at multiple scales.
电子健康记录 (EHR) 被用于医院中存储每位患者的诊断、临床医生记录、检查、实验室结果和干预措施。例如,通过聚类将患者分为不同的子集,可以发现未知的疾病模式或合并症,这最终可能通过个性化医疗带来更好的治疗效果。从 EHR 中提取出的患者数据具有异质性和时间不规则性。因此,传统的机器学习方法(如 PCA)不适用于分析 EHR 衍生的患者数据。我们提出了一种新的方法,该方法基于直接在健康记录数据上训练门控循环单元 (GRU) 自动编码器来解决这些问题。我们的方法通过在患者数据时间序列上进行训练,学习到一个低维特征空间,其中每个数据点的时间都被明确表示。我们使用时间位置编码,使我们的模型能够更好地处理数据的时间不规则性。我们将我们的方法应用于来自重症监护医疗信息集市 (MIMIC-III) 的数据。使用我们的数据衍生特征空间,我们可以将患者聚类为代表主要疾病模式类别的组。此外,我们还表明,我们的特征空间在多个尺度上表现出丰富的子结构。