Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
SIB Swiss Institute of Bioinformatics, Switzerland.
Bioinformatics. 2018 Jul 1;34(13):i438-i446. doi: 10.1093/bioinformatics/bty246.
Most modern intensive care units record the physiological and vital signs of patients. These data can be used to extract signatures, commonly known as biomarkers, that help physicians understand the biological complexity of many syndromes. However, most biological biomarkers suffer from either poor predictive performance or weak explanatory power. Recent developments in time series classification focus on discovering shapelets, i.e. subsequences that are most predictive in terms of class membership. Shapelets have the advantage of combining a high predictive performance with an interpretable component-their shape. Currently, most shapelet discovery methods do not rely on statistical tests to verify the significance of individual shapelets. Therefore, identifying associations between the shapelets of physiological biomarkers and patients that exhibit certain phenotypes of interest enables the discovery and subsequent ranking of physiological signatures that are interpretable, statistically validated and accurate predictors of clinical endpoints.
We present a novel and scalable method for scanning time series and identifying discriminative patterns that are statistically significant. The significance of a shapelet is evaluated while considering the problem of multiple hypothesis testing and mitigating it by efficiently pruning untestable shapelet candidates with Tarone's method. We demonstrate the utility of our method by discovering patterns in three of a patient's vital signs: heart rate, respiratory rate and systolic blood pressure that are indicators of the severity of a future sepsis event, i.e. an inflammatory response to an infective agent that can lead to organ failure and death, if not treated in time.
We make our method and the scripts that are required to reproduce the experiments publicly available at https://github.com/BorgwardtLab/S3M.
Supplementary data are available at Bioinformatics online.
大多数现代重症监护病房都会记录患者的生理和生命体征数据。这些数据可用于提取特征,通常称为生物标志物,以帮助医生了解许多综合征的生物复杂性。然而,大多数生物标志物要么预测性能差,要么解释能力弱。时间序列分类的最新进展集中在发现形状子上,即根据类别成员身份最具预测性的子序列。形状子具有结合高预测性能和可解释成分(其形状)的优点。目前,大多数形状子发现方法不依赖于统计检验来验证单个形状子的显著性。因此,确定表现出特定感兴趣表型的患者的生理生物标志物的形状子之间的关联,可以发现可解释、经统计学验证且准确预测临床终点的生理特征的发现和随后的排名。
我们提出了一种新颖且可扩展的方法,用于扫描时间序列并识别具有统计学意义的判别模式。在考虑多重假设检验问题的同时评估形状子的显著性,并通过 Tarone 方法有效地修剪不可测试的形状子候选来减轻该问题。我们通过发现患者的三个生命体征(心率、呼吸率和收缩压)中的模式来证明我们方法的实用性,这些模式是未来败血症事件严重程度的指标,即对感染因子的炎症反应,如果不及时治疗,可能导致器官衰竭和死亡。
我们在 https://github.com/BorgwardtLab/S3M 上公开了我们的方法和重现实验所需的脚本。
补充数据可在生物信息学在线获得。