文献检索，用中文搜 PubMed

BACKGROUND

The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications.

METHOD

We analyzed the laboratory measures derived from Geisinger's EHR on patients in three distinct cohorts-patients tested for (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns.

RESULTS

We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as -35.5 for the Cdiff, -8.3 for the IBD, and -11.3 for the OA dataset.

CONCLUSIONS

An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

背景

缺失值插补是电子健康记录（EHR）挖掘中的关键步骤，因为它会显著影响转化医学下游分析得出的结论。EHR中实验室检查值的缺失并非随机，但插补技术往往忽略了这一关键区别。因此，开发专门针对EHR的自适应插补策略是改善数据不平衡以及增强医疗保健应用建模工具预测能力的重要一步。

方法

我们分析了从盖辛格医疗系统（Geisinger）的EHR中获取的针对三个不同队列患者的实验室检查指标，这三个队列分别为：接受艰难梭菌（Cdiff）感染检测的患者、诊断为炎症性肠病（IBD）的患者以及诊断为髋或膝骨关节炎（OA）的患者。我们提取了逻辑观察标识符名称和代码（LOINC），并排除了缺失率达到或超过75%的指标。同时还提取了合并症、主要或次要诊断以及当前问题列表。自适应插补策略基于一种混合方法设计。将患者的合并症模式转化为潜在模式，然后进行聚类。对每个队列中的一组患者独立进行插补，以展示该方法的通用性。将结果与应用于完整数据集且未纳入合并症模式信息的插补结果进行比较。

结果

我们总共分析了67445名患者（11230名IBD患者、10000名OA患者以及46215名接受Cdiff感染检测的患者）。基于EHR中的主要/次要诊断和当前问题列表，我们为IBD队列提取了495个LOINC和11230个诊断代码，为Cdiff队列提取了8160个诊断代码，为OA队列提取了2042个诊断代码。总体而言，当实验室检查指标的缺失程度较高时，该策略带来的改善最为明显。每个数据集的最佳均方根误差（RMSE）差异记录如下：Cdiff数据集为-35.5，IBD数据集为-8.3，OA数据集为-11.3。

结论

专门为EHR设计的自适应插补策略，利用患者临床特征中的补充信息，可用于改善缺失实验室检查值的插补，特别是当分析中包含缺失率较高的实验室代码时。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

提高机器学习应用中实验室测量的密度

Increasing the Density of Laboratory Measures for Machine Learning Applications.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

相似文献

引用本文的文献

本文引用的文献

提高机器学习应用中实验室测量的密度

Increasing the Density of Laboratory Measures for Machine Learning Applications.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献