Suppr超能文献

数字症状评估应用程序在提示病症和紧急程度建议方面的准确性如何?与全科医生进行临床病例比较。

How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs.

机构信息

Ada Health GmbH, Berlin, Germany

Ada Health GmbH, Berlin, Germany.

出版信息

BMJ Open. 2020 Dec 16;10(12):e040269. doi: 10.1136/bmjopen-2020-040269.

Abstract

OBJECTIVES

To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of eight popular symptom assessment apps.

DESIGN

Vignettes study.

SETTING

200 primary care vignettes.

INTERVENTION/COMPARATOR: For eight apps and seven general practitioners (GPs): breadth of coverage and condition-suggestion and urgency advice accuracy measured against the vignettes' gold-standard.

PRIMARY OUTCOME MEASURES

(1) Proportion of conditions 'covered' by an app, that is, not excluded because the user was too young/old or pregnant, or not modelled; (2) proportion of vignettes with the correct primary diagnosis among the top 3 conditions suggested; (3) proportion of 'safe' urgency advice (ie, at gold standard level, more conservative, or no more than one level less conservative).

RESULTS

Condition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%; WebMD: 93.0%. Top-3 suggestion accuracy was GPs (average): 82.1%±5.2%; Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%; Your.MD: 23.5%. Some apps excluded certain user demographics or conditions and their performance was generally greater with the exclusion of corresponding vignettes. For safe urgency advice, tested GPs had an average of 97.0%±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 SD of the GPs-Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 SDs of GPs-Your.MD: 92.6%. Three apps had a safety performance outside 2 SDs of GPs-Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3×10).

CONCLUSIONS

The utility of digital symptom assessment apps relies on coverage, accuracy and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.

摘要

目的

比较八种流行的症状评估应用程序在条件覆盖范围、建议条件的准确性和紧急建议的适当性方面的广度。

设计

情景研究。

设置

200 个初级保健情景。

干预/比较:八种应用程序和七名全科医生(GP):根据情景的金标准衡量覆盖率以及条件建议和紧急建议的准确性。

主要结局指标

(1)应用程序“涵盖”的条件比例,即不因为用户太年轻/年老或怀孕,或未建模而排除的条件比例;(2)前 3 个建议条件中正确的主要诊断比例;(3)“安全”紧急建议的比例(即,在金标准水平、更保守或不超过一个级别不那么保守)。

结果

条件建议的覆盖范围差异很大,有些应用程序对许多用户没有提供建议:按字母顺序,Ada:99.0%;Babylon:51.5%;Buoy:88.5%;K Health:74.5%;Mediktor:80.5%;Symptomate:61.5%;Your.MD:64.5%;WebMD:93.0%。顶级 3 建议准确性是 GP(平均值):82.1%±5.2%;Ada:70.5%;Babylon:32.0%;Buoy:43.0%;K Health:36.0%;Mediktor:36.0%;Symptomate:27.5%;WebMD:35.5%;Your.MD:23.5%。一些应用程序排除了某些用户群体或条件,并且在排除相应的情景时,其性能通常更好。对于安全的紧急建议,测试的 GP 平均有 97.0%±2.5%。对于提供建议的情景,只有三个应用程序在安全性方面的表现与 GP 相差 1 个标准差以内-Ada:97.0%;Babylon:95.1%;Symptomate:97.8%。一个应用程序在与 GP 相差 2 个标准差以内具有安全性性能-Your.MD:92.6%。三个应用程序的安全性表现超出了 GP 两个标准差范围-Buoy:80.0%(p<0.001);K Health:81.3%(p<0.001);Mediktor:87.3%(p=1.3×10)。

结论

数字症状评估应用程序的实用性取决于覆盖范围、准确性和安全性。虽然没有一种数字工具优于 GP,但有些工具非常接近,并且软件的迭代改进性质为护理提供了可扩展的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f5/7745523/8f60baf76ff9/bmjopen-2020-040269f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验