Vega Julio, Li Meng, Aguillera Kwesi, Goel Nikunj, Joshi Echhit, Khandekar Kirtiraj, Durica Krina C, Kunta Abhineeth R, Low Carissa A
Department of Medicine, University of Pittsburgh, Pittsburgh, PA, United States.
Front Digit Health. 2021 Nov 18;3:769823. doi: 10.3389/fdgth.2021.769823. eCollection 2021.
Smartphone and wearable devices are widely used in behavioral and clinical research to collect longitudinal data that, along with ground truth data, are used to create models of human behavior. Mobile sensing researchers often program data processing and analysis code from scratch even though many research teams collect data from similar mobile sensors, platforms, and devices. This leads to significant inefficiency in not being able to replicate and build on others' work, inconsistency in quality of code and results, and lack of transparency when code is not shared alongside publications. We provide an overview of Reproducible Analysis Pipeline for Data Streams (RAPIDS), a reproducible pipeline to standardize the preprocessing, feature extraction, analysis, visualization, and reporting of data streams coming from mobile sensors. RAPIDS is formed by a group of R and Python scripts that are executed on top of reproducible virtual environments, orchestrated by a workflow management system, and organized following a consistent file structure for data science projects. We share open source, documented, extensible and tested code to preprocess, extract, and visualize behavioral features from data collected with any Android or iOS smartphone sensing app as well as Fitbit and Empatica wearable devices. RAPIDS allows researchers to process mobile sensor data in a rigorous and reproducible way. This saves time and effort during the data analysis phase of a project and facilitates sharing analysis workflows alongside publications.
智能手机和可穿戴设备广泛应用于行为和临床研究,以收集纵向数据,这些数据与地面真值数据一起用于创建人类行为模型。尽管许多研究团队从类似的移动传感器、平台和设备收集数据,但移动传感研究人员往往要从头编写数据处理和分析代码。这导致无法复制和借鉴他人的工作,造成了显著的低效率,代码和结果的质量也不一致,而且当代码不与出版物一起共享时缺乏透明度。我们概述了数据流可重复分析管道(RAPIDS),这是一个可重复的管道,用于标准化来自移动传感器的数据流的预处理、特征提取、分析、可视化和报告。RAPIDS由一组R和Python脚本组成,这些脚本在可重复的虚拟环境之上执行,由工作流管理系统编排,并按照数据科学项目一致的文件结构进行组织。我们共享开源、有文档记录、可扩展且经过测试的代码,用于对通过任何安卓或iOS智能手机传感应用以及Fitbit和Empatica可穿戴设备收集的数据进行预处理、提取和可视化行为特征。RAPIDS使研究人员能够以严谨且可重复的方式处理移动传感器数据。这在项目的数据分析阶段节省了时间和精力,并便于在发表论文时共享分析工作流程。