IEEE J Biomed Health Inform. 2019 Jan;23(1):59-65. doi: 10.1109/JBHI.2018.2832610. Epub 2018 May 2.
Real-time analysis of streaming physiological data to identify earlier abnormal conditions is an important aspect of precision medicine. However, open-source systems supporting this workflow are lacking. In this paper, we present PhysOnline, a pipeline built on the open-source Apache Spark platform to ingest streaming physiological data for online feature extraction and machine learning. We consider scalability factors for horizontal deployment to support growing analysis requirements. We further integrate real-time feature extraction, including pattern recognition methods as well as descriptive statistical components to identify temporal characteristics of waveform signals. These generated features are then used for machine learning and for real-time classification of abnormal conditions. As a case study, we present the online classification of electrocardiography recordings for screening Paroxysmal Atrial Fibrillation (PAF) and demonstrate that our pipeline can predict persons developing PAF at least 45 min. before an episode of that condition. This pipeline can be applied in domains where pattern matching, temporal abstractions, and morphological characteristics can be used for real-time classification of streaming time-series data..
实时分析流式生理数据以识别早期异常情况是精准医疗的一个重要方面。然而,缺乏支持这种工作流程的开源系统。在本文中,我们提出了 PhysOnline,这是一个构建在开源 Apache Spark 平台上的管道,用于摄取流式生理数据以进行在线特征提取和机器学习。我们考虑了水平部署的可扩展性因素,以支持不断增长的分析需求。我们进一步集成了实时特征提取,包括模式识别方法以及描述性统计组件,以识别波形信号的时间特征。然后,这些生成的特征用于机器学习和实时分类异常情况。作为案例研究,我们展示了用于筛选阵发性心房颤动 (PAF) 的心电图记录的在线分类,并证明我们的管道可以在该病症发作前至少 45 分钟预测出出现 PAF 的人。该管道可应用于可以使用模式匹配、时间抽象和形态特征进行实时分类的流式时间序列数据的领域。