Suppr超能文献

实验室:一个用于从实验室记录生成可用于分析的数据的R软件包。

lab: an R package for generating analysis-ready data from laboratory records.

作者信息

Tseng Yi-Ju, Chen Chun Ju, Chang Chia Wei

机构信息

Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States of America.

出版信息

PeerJ Comput Sci. 2023 Aug 25;9:e1528. doi: 10.7717/peerj-cs.1528. eCollection 2023.

Abstract

BACKGROUND

Electronic health records (EHRs) play a crucial role in healthcare decision-making by giving physicians insights into disease progression and suitable treatment options. Within EHRs, laboratory test results are frequently utilized for predicting disease progression. However, processing laboratory test results often poses challenges due to variations in units and formats. In addition, leveraging the temporal information in EHRs can improve outcomes, prognoses, and diagnosis predication. Nevertheless, the irregular frequency of the data in these records necessitates data preprocessing, which can add complexity to time-series analyses.

METHODS

To address these challenges, we developed an open-source R package that facilitates the extraction of temporal information from laboratory records. The proposed package generates analysis-ready time series data by segmenting the data into time-series windows and imputing missing values. Moreover, users can map local laboratory codes to the Logical Observation Identifier Names and Codes (LOINC), an international standard. This mapping allows users to incorporate additional information, such as reference ranges and related diseases. Moreover, the reference ranges provided by LOINC enable us to categorize results into normal or abnormal. Finally, the analysis-ready time series data can be further summarized using descriptive statistics and utilized to develop models using machine learning technologies.

RESULTS

Using the package, we analyzed data from MIMIC-III, focusing on newborns with patent ductus arteriosus (PDA). We extracted time-series laboratory records and compared the differences in test results between patients with and without 30-day in-hospital mortality. We then identified significant variations in several laboratory test results 7 days after PDA diagnosis. Leveraging the time series-analysis-ready data, we trained a prediction model with the long short-term memory algorithm, achieving an area under the receiver operating characteristic curve of 0.83 for predicting 30-day in-hospital mortality in model training. These findings demonstrate the lab package's effectiveness in analyzing disease progression.

CONCLUSIONS

The proposed package simplifies and expedites the workflow involved in laboratory records extraction. This tool is particularly valuable in assisting clinical data analysts in overcoming the obstacles associated with heterogeneous and sparse laboratory records.

摘要

背景

电子健康记录(EHRs)通过让医生深入了解疾病进展和合适的治疗方案,在医疗决策中发挥着关键作用。在电子健康记录中,实验室检测结果经常被用于预测疾病进展。然而,由于单位和格式的差异,处理实验室检测结果往往带来挑战。此外,利用电子健康记录中的时间信息可以改善治疗结果、预后和诊断预测。尽管如此,这些记录中数据的不规则频率需要进行数据预处理,这会增加时间序列分析的复杂性。

方法

为应对这些挑战,我们开发了一个开源R包,便于从实验室记录中提取时间信息。该提议的包通过将数据分割成时间序列窗口并插补缺失值,生成可用于分析的时间序列数据。此外,用户可以将本地实验室代码映射到国际标准的逻辑观察标识符名称和代码(LOINC)。这种映射允许用户纳入额外信息,如参考范围和相关疾病。此外,LOINC提供的参考范围使我们能够将结果分类为正常或异常。最后,可用于分析的时间序列数据可以使用描述性统计进行进一步汇总,并利用机器学习技术开发模型。

结果

使用该包,我们分析了来自MIMIC-III的数据,重点关注患有动脉导管未闭(PDA)的新生儿。我们提取了时间序列实验室记录,并比较了有和没有30天院内死亡率的患者之间检测结果的差异。然后,我们确定了PDA诊断后7天几种实验室检测结果的显著差异。利用可用于时间序列分析的数据,我们使用长短期记忆算法训练了一个预测模型,在模型训练中预测30天院内死亡率的受试者工作特征曲线下面积达到0.83。这些发现证明了实验室包在分析疾病进展方面的有效性。

结论

提议的包简化并加快了实验室记录提取所涉及的工作流程。这个工具在协助临床数据分析师克服与异构和稀疏实验室记录相关的障碍方面特别有价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad7/10495959/c68407f33edd/peerj-cs-09-1528-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验