Suppr超能文献

时间序列汇总统计作为临床预测任务特征的评估。

An evaluation of time series summary statistics as features for clinical prediction tasks.

机构信息

Institute of Systems Engineering, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116024, People's Republic of China.

Health Management Center, The First Affiliated Hospital of Zhengzhou University, No. 1 Longhu central ring road, Zhengzhou, 450052, People's Republic of China.

出版信息

BMC Med Inform Decis Mak. 2020 Mar 5;20(1):48. doi: 10.1186/s12911-020-1063-x.

Abstract

BACKGROUND

Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series.

METHODS

In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination.

RESULTS

By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities.

CONCLUSION

Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect.

摘要

背景

临床预测任务,如患者死亡率、住院时间和疾病诊断,在重症监护研究中非常重要。现有的临床预测研究主要使用简单的汇总统计来总结生理时间序列中的信息。然而,这种缺乏统计信息导致信息不足。此外,仅使用最大值和最小值统计来表示患者特征不足以提供充分的解释。很少有研究评估哪些汇总统计量最能代表生理时间序列。

方法

在本文中,我们总结了 14 种描述生理时间序列特征的统计量,包括集中趋势、离散趋势和分布形状。然后,我们评估了生理时间序列汇总统计量作为三个临床预测任务特征的使用情况。为了找到在不同任务下表现最佳的统计组合,我们使用基于交叉验证的遗传算法来近似最优统计组合。

结果

通过对 6927 名患者的电子病历进行实验,我们在三个临床预测任务中获得了基于单个统计量和常用统计量组合的预测结果。基于嵌入式交叉验证遗传算法的结果,我们得到了 25 个最佳统计组合,并测试了它们的预测结果。通过比较单个统计量和常用统计量组合的预测性能与对最优统计组合的定量分析,我们发现某些统计量在患者表示中起着核心作用,不同的预测任务具有一定的共性。

结论

通过对结果的深入分析,我们发现了许多实用的参考点,可以为后续相关研究提供指导。表示离散趋势的统计量,如 min、max 和 range,更适合住院时间预测任务,并且它们还为短期死亡率预测提供信息。反映生理时间序列集中趋势的均值和分位数更适合死亡率和疾病预测。偏度和峰度单独用于预测时表现不佳,但可作为补充统计量,以提高整体预测效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b0f/7059727/f043ab6e7415/12911_2020_1063_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验