TIC Salut Social-Ministry of Health, 08028 Barcelona, Spain.
CRES&CEXS-Pompeu Fabra University, 08003 Barcelona, Spain.
Int J Environ Res Public Health. 2020 Feb 9;17(3):1093. doi: 10.3390/ijerph17031093.
: The primary care service in Catalonia has operated an asynchronous teleconsulting service between GPs and patients since 2015 (eConsulta), which has generated some 500,000 messages. New developments in big data analysis tools, particularly those involving natural language, can be used to accurately and systematically evaluate the impact of the service. : The study was intended to assess the predictive potential of eConsulta messages through different combinations of vector representation of text and machine learning algorithms and to evaluate their performance. : Twenty machine learning algorithms (based on five types of algorithms and four text representation techniques) were trained using a sample of 3559 messages (169,102 words) corresponding to 2268 teleconsultations (1.57 messages per teleconsultation) in order to predict the three variables of interest (avoiding the need for a face-to-face visit, increased demand and type of use of the teleconsultation). The performance of the various combinations was measured in terms of precision, sensitivity, F-value and the ROC curve. : The best-trained algorithms are generally effective, proving themselves to be more robust when approximating the two binary variables "avoiding the need of a face-to-face visit" and "increased demand" (precision = 0.98 and 0.97, respectively) rather than the variable "type of query" (precision = 0.48). : To the best of our knowledge, this study is the first to investigate a machine learning strategy for text classification using primary care teleconsultation datasets. The study illustrates the possible capacities of text analysis using artificial intelligence. The development of a robust text classification tool could be feasible by validating it with more data, making it potentially more useful for decision support for health professionals.
加泰罗尼亚的初级保健服务自 2015 年以来一直提供 GP 和患者之间的异步远程咨询服务(eConsulta),已产生约 50 万条消息。大数据分析工具的新发展,特别是涉及自然语言的工具,可以用于准确和系统地评估该服务的影响。
本研究旨在通过不同的文本向量表示和机器学习算法组合来评估 eConsulta 消息的预测潜力,并评估它们的性能。
使用来自 2268 次远程咨询(每次咨询 1.57 条消息)的 3559 条消息(169102 个字)的样本,训练了 20 种机器学习算法(基于 5 种算法类型和 4 种文本表示技术),以预测三个感兴趣的变量(避免需要面对面就诊、增加需求和远程咨询的使用类型)。通过精度、灵敏度、F 值和 ROC 曲线来衡量各种组合的性能。
最佳训练算法通常是有效的,在逼近两个二进制变量“避免面对面就诊”和“增加需求”(精度分别为 0.98 和 0.97)时,证明它们比变量“查询类型”(精度为 0.48)更稳健。
据我们所知,这是第一项使用初级保健远程咨询数据集研究机器学习文本分类策略的研究。该研究说明了使用人工智能进行文本分析的可能能力。通过使用更多数据验证该工具的稳健性,开发出一种稳健的文本分类工具可能更具实用性,可为卫生专业人员提供决策支持。