Cho Peter J, Olaye Iredia M, Shandhi Md Mobashir Hasan, Daza Eric J, Foschini Luca, Dunn Jessilyn P
Biomedical Engineering Department, Duke University, Durham, NC, USA.
Evidation Health, San Mateo, CA, USA.
Lancet Digit Health. 2025 Jan;7(1):e23-e34. doi: 10.1016/S2589-7500(24)00219-X.
Longitudinal digital health studies combine passively collected information from digital devices, such as commercial wearable devices, and actively contributed data, such as surveys, from participants. Although the use of smartphones and access to the internet supports the development of these studies, challenges exist in collecting representative data due to low adherence and retention. We aimed to identify key factors related to adherence and retention in digital health studies and develop a methodology to identify factors that are associated with and might affect study participant engagement.
In this exploratory secondary analysis, we used data from two separate prospective longitudinal digital health studies, conducted among adult participants (age ≥18 years) during the COVID-19 pandemic by the BIG IDEAs Laboratory (BIL) at Duke University (Durham, NC, USA; April 2, 2020 to May 25, 2021) and Evidation Health (San Mateo, CA, USA; April 4 to Aug 31, 2020). Prospective daily or weekly surveys were administered for up to 15 months in the BIL study and daily surveys were administered for 5 months in the Evidation Health study. We defined metrics related to adherence to assess how participants engage with longitudinal digital health studies and developed models to infer how demographic factors and the day of survey delivery might be associated with these metrics. We defined retention as the time until a participant drops out of the study. For the purpose of clustering analysis, we defined three metrics of survey adherence: (1) total number of surveys completed, (2) participation regularity (ie, frequency of filling out surveys consecutively), and (3) time of activity (ie, engagement pattern relative to enrolment time). We assessed these metrics and explored differences by age, sex, race, and day of survey delivery. We analysed the data by unsupervised clustering, survival analysis, and recurrent event analysis with multistate modelling, with analyses restricted to individuals who provided data on age, sex, and race.
In the BIL study, 5784 unique participants with the required demographic data completed 388 600 unique daily surveys (mean 67 [SD 90] surveys per participant). In the Evidation Health study, 89 479 unique participants with the required demographic data completed 2 080 992 unique daily surveys (23 [32] surveys per participant). Participants were grouped into adherence clusters based on the three metrics of adherence, and we identified statistically discernible differences in age, race, and sex between clusters. Most of the individuals aged 18-29 years were observed in the clusters with low or medium adherence, whereas the oldest age group (≥60 years) was generally more represented in clusters with high adherence than younger age groups. For retention, survival analysis indicated that 18-29 years was the age group with the highest risk of exiting the study at any given point in time (BIL study, hazard ratio [HR] for 18-29 years vs ≥60 years, 1·69 [95% CI 1·53-1·86; p<0·0001]; Evidation Health study, HR 1·50 [1·47-1·53; p<0·0001]). Sex and race were not discernible predictors of retention in the BIL study. In the Evidation Health study, male participants (vs female participants; HR 0·96 [0·94-0·98]; p<0·0001) and White participants (vs Asian participants; HR 0·96 [0·93-0·98; p=0·0004) had a lower risk of study exit, and Other race participants (vs Asian participants) had a higher risk of study exit (HR 1·10 [1·06-1·14; p<0·0001]). Recurrent event analysis confirmed age as the factor most associated with adherence; for the 18-29 years age group (vs ≥60 years group), the transition intensity from an active to inactive state per day in the BIL study was 1·661 (95% CI 1·606-1·718) and in the Evidation Health study was 1·108 (1·094-1·121). Participation patterns were variable by race and sex between the studies.
Our analyses revealed that age was consistently associated with adherence and retention, with younger participants having lower adherence and higher dropout rates than older participants. Unsupervised clustering and survival analyses are established methods in this field, whereas the use of recurrent event analysis, was, to our knowledge, the first instance of the application of this method to remote digital health data. These methods can help to understand participant engagement in digital health studies, supporting targeted measures to improve adherence and retention.
US National Science Foundation, US National Institutes of Health, Microsoft AI for Health, Duke Clinical and Translational Science Institute, North Carolina Biotechnology Center, Duke MEDx, Duke Bass Connections, Duke Margolis Center for Health Policy, and Duke Office of Information Technology.
纵向数字健康研究结合了从数字设备(如商用可穿戴设备)被动收集的信息以及参与者主动提供的数据(如调查问卷)。尽管智能手机的使用和互联网接入支持了这些研究的开展,但由于依从性和留存率较低,在收集具有代表性的数据方面仍存在挑战。我们旨在确定与数字健康研究中的依从性和留存率相关的关键因素,并开发一种方法来识别与研究参与者参与度相关且可能影响参与度的因素。
在这项探索性二次分析中,我们使用了两项独立的前瞻性纵向数字健康研究的数据,这两项研究由美国杜克大学(北卡罗来纳州达勒姆;2020年4月2日至2021年5月25日)的BIG IDEAs实验室(BIL)以及Evidation Health(美国加利福尼亚州圣马特奥;2020年4月4日至8月31日)在新冠疫情期间针对成年参与者(年龄≥18岁)开展。在BIL研究中,前瞻性每日或每周调查进行了长达15个月,在Evidation Health研究中,每日调查进行了5个月。我们定义了与依从性相关的指标,以评估参与者如何参与纵向数字健康研究,并开发了模型来推断人口统计学因素和调查发放日期可能如何与这些指标相关联。我们将留存率定义为参与者退出研究之前的时间。为了进行聚类分析,我们定义了调查依从性的三个指标:(1)完成的调查总数,(2)参与规律性(即连续填写调查问卷的频率),以及(3)活动时间(即相对于入组时间的参与模式)。我们评估了这些指标,并按年龄、性别、种族和调查发放日期探索了差异。我们通过无监督聚类、生存分析以及使用多状态建模的复发事件分析来分析数据,分析仅限于提供了年龄、性别和种族数据的个体。
在BIL研究中,5784名具有所需人口统计学数据的独特参与者完成了388600份独特的每日调查(每位参与者平均67份[标准差90份]调查)。在Evidation Health研究中,89479名具有所需人口统计学数据的独特参与者完成了2080992份独特的每日调查(每位参与者23份[32份]调查)。根据依从性的三个指标,参与者被分为依从性类别,我们发现不同类别之间在年龄、种族和性别上存在统计学上可辨别的差异。在依从性低或中等的类别中,观察到的18 - 29岁个体最多,而年龄最大的年龄组(≥60岁)在依从性高的类别中通常比年轻年龄组更具代表性。对于留存率,生存分析表明,18 - 29岁是在任何给定时间点退出研究风险最高的年龄组(BIL研究,18 - 29岁与≥60岁相比的风险比[HR]为1.69[95%置信区间1.53 - 1.86;p<0.0001];Evidation Health研究,HR为1.50[1.47 - 1.53;p<0.0001])。在BIL研究中,性别和种族不是留存率的可辨别预测因素。在Evidation Health研究中,男性参与者(与女性参与者相比;HR为0.96[0.94 - 0.98];p<0.0001)和白人参与者(与亚洲参与者相比;HR为0.96[0.93 - 0.98;p = 0.0004])退出研究的风险较低,其他种族参与者(与亚洲参与者相比)退出研究的风险较高(HR为1.10[1.06 - 1.14;p<0.0001])。复发事件分析证实年龄是与依从性最相关的因素;对于18 - 29岁年龄组(与≥60岁组相比),在BIL研究中,每天从活跃状态转变为不活跃状态的转换强度为1.661(95%置信区间1.606 - 1.718),在Evidation Health研究中为1.108(1.094 - 1.121)。两项研究之间的参与模式因种族和性别而异。
我们的分析表明,年龄始终与依从性和留存率相关,年轻参与者的依从性低于年长参与者,辍学率高于年长参与者。无监督聚类和生存分析是该领域已确立的方法,而据我们所知,复发事件分析的使用是该方法首次应用于远程数字健康数据。这些方法有助于理解参与者在数字健康研究中的参与度,支持采取针对性措施来提高依从性和留存率。
美国国家科学基金会、美国国立卫生研究院、微软健康人工智能、杜克临床与转化科学研究所、北卡罗来纳生物技术中心、杜克医学创新中心、杜克巴斯连接项目、杜克马戈利斯健康政策中心以及杜克信息技术办公室。