Suppr超能文献

基于生活日志数据的数字医疗保健应用客户流失预测模型:回顾性观察研究。

Lifelog Data-Based Prediction Model of Digital Health Care App Customer Churn: Retrospective Observational Study.

机构信息

Department of Biomedical Systems Informatics, College of Medicine, Yonsei University, Seoul, Republic of Korea.

Department of Information Medicine, Asan Medical Center, College of Medicine, University of Ulsan, Seoul, Republic of Korea.

出版信息

J Med Internet Res. 2021 Jan 6;23(1):e22184. doi: 10.2196/22184.

Abstract

BACKGROUND

Customer churn is the rate at which customers stop doing business with an entity. In the field of digital health care, user churn prediction is important not only in terms of company revenue but also for improving the health of users. Churn prediction has been previously studied, but most studies applied time-invariant model structures and used structured data. However, additional unstructured data have become available; therefore, it has become essential to process daily time-series log data for churn predictions.

OBJECTIVE

We aimed to apply a recurrent neural network structure to accept time-series patterns using lifelog data and text message data to predict the churn of digital health care users.

METHODS

This study was based on the use data of a digital health care app that provides interactive messages with human coaches regarding food, exercise, and weight logs. Among the users in Korea who enrolled between January 1, 2017 and January 1, 2019, we defined churn users according to the following criteria: users who received a refund before the paid program ended and users who received a refund 7 days after the trial period. We used long short-term memory with a masking layer to receive sequence data with different lengths. We also performed topic modeling to vectorize text messages. To interpret the contributions of each variable to model predictions, we used integrated gradients, which is an attribution method.

RESULTS

A total of 1868 eligible users were included in this study. The final performance of churn prediction was an F1 score of 0.89; that score decreased by 0.12 when the data of the final week were excluded (F1 score 0.77). Additionally, when text data were included, the mean predicted performance increased by approximately 0.085 at every time point. Steps per day had the largest contribution (0.1085). Among the topic variables, poor habits (eg, drinking alcohol, overeating, and late-night eating) showed the largest contribution (0.0875).

CONCLUSIONS

The model with a recurrent neural network architecture that used log data and message data demonstrated high performance for churn classification. Additionally, the analysis of the contribution of the variables is expected to help identify signs of user churn in advance and improve the adherence in digital health care.

摘要

背景

客户流失是指客户停止与实体开展业务的比率。在数字医疗保健领域,用户流失预测不仅对公司收入很重要,而且对改善用户健康也很重要。已经对流失预测进行了研究,但大多数研究都应用了时不变模型结构并使用了结构化数据。然而,更多的非结构化数据已经可用;因此,处理日常时间序列日志数据以进行流失预测变得至关重要。

目的

我们旨在应用循环神经网络结构,使用生活记录数据和短信数据来接受时间序列模式,从而预测数字医疗保健用户的流失情况。

方法

本研究基于提供有关食物、运动和体重记录的互动信息的数字医疗保健应用程序的数据。在 2017 年 1 月 1 日至 2019 年 1 月 1 日期间在韩国注册的用户中,我们根据以下标准定义流失用户:在付费计划结束前收到退款的用户和在试用期结束后 7 天内收到退款的用户。我们使用具有屏蔽层的长短时记忆来接收具有不同长度的序列数据。我们还对短信进行主题建模以将其向量化。为了解释每个变量对模型预测的贡献,我们使用了积分梯度,这是一种归因方法。

结果

共有 1868 名符合条件的用户纳入本研究。流失预测的最终性能为 F1 得分为 0.89;当排除最后一周的数据时,该分数下降了 0.12(F1 得分为 0.77)。此外,当包含文本数据时,每个时间点的平均预测性能大约增加了 0.085。每天的步数贡献最大(0.1085)。在主题变量中,不良习惯(例如,饮酒、暴饮暴食和深夜进食)的贡献最大(0.0875)。

结论

使用日志数据和消息数据的具有循环神经网络架构的模型表现出了较高的流失分类性能。此外,对变量贡献的分析有望帮助提前识别用户流失的迹象并提高数字医疗保健的依从性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ad1/7817354/a8253362fc35/jmir_v23i1e22184_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验