Universitat Rovira i Virgili; Department of Psychology, Centre de Recerca en Avaluació i Mesura de la Conducta, Institut d'Investigació Sanitària Pere Virgili, Tarragona, Spain.
JMIR Mhealth Uhealth. 2021 Apr 19;9(4):e26471. doi: 10.2196/26471.
There is a huge number of health-related apps available, and the numbers are growing fast. However, many of them have been developed without any kind of quality control. In an attempt to contribute to the development of high-quality apps and enable existing apps to be assessed, several guides have been developed.
The main aim of this study was to study the interrater reliability of a new guide - the Mobile App Development and Assessment Guide (MAG) - and compare it with one of the most used guides in the field, the Mobile App Rating Scale (MARS). Moreover, we also focused on whether the interrater reliability of the measures is consistent across multiple types of apps and stakeholders.
In order to study the interrater reliability of the MAG and MARS, we evaluated the 4 most downloaded health apps for chronic health conditions in the medical category of IOS and Android devices (ie, App Store and Google Play). A group of 8 reviewers, representative of individuals that would be most knowledgeable and interested in the use and development of health-related apps and including different types of stakeholders such as clinical researchers, engineers, health care professionals, and end users as potential patients, independently evaluated the quality of the apps using the MAG and MARS. We calculated the Krippendorff alpha for every category in the 2 guides, for each type of reviewer and every app, separately and combined, to study the interrater reliability.
Only a few categories of the MAG and MARS demonstrated a high interrater reliability. Although the MAG was found to be superior, there was considerable variation in the scores between the different types of reviewers. The categories with the highest interrater reliability in MAG were "Security" (α=0.78) and "Privacy" (α=0.73). In addition, 2 other categories, "Usability" and "Safety," were very close to compliance (health care professionals: α=0.62 and 0.61, respectively). The total interrater reliability of the MAG (ie, for all categories) was 0.45, whereas the total interrater reliability of the MARS was 0.29.
This study shows that some categories of MAG have significant interrater reliability. Importantly, the data show that the MAG scores are better than the ones provided by the MARS, which is the most commonly used guide in the area. However, there is great variability in the responses, which seems to be associated with subjective interpretation by the reviewers.
现已有大量与健康相关的应用程序,且数量还在快速增长。然而,其中许多应用程序在开发过程中并未经过任何质量控制。为了促进高质量应用程序的开发并对现有应用程序进行评估,已经制定了多个指南。
本研究的主要目的是研究一个新指南——移动应用程序开发和评估指南(MAG)的评分者间信度,并将其与该领域最常用的指南之一——移动应用程序评级量表(MARS)进行比较。此外,我们还关注的是,对于多种类型的应用程序和利益相关者,该测量的评分者间信度是否一致。
为了研究 MAG 和 MARS 的评分者间信度,我们评估了 IOS 和 Android 设备(即 App Store 和 Google Play)医疗类中最常下载的 4 种慢性健康状况的健康相关应用程序。一组由 8 名评审员组成,代表了最了解和有兴趣使用和开发健康相关应用程序的个人,包括不同类型的利益相关者,如临床研究人员、工程师、医疗保健专业人员和潜在患者的最终用户,他们使用 MAG 和 MARS 独立评估应用程序的质量。我们为每个指南的每个类别、每个类型的评审员和每个应用程序分别计算了 Krippendorff 系数,以研究评分者间信度。
仅 MAG 和 MARS 的少数几个类别显示出较高的评分者间信度。尽管 MAG 被发现更具优势,但不同类型的评审员之间的评分存在相当大的差异。MAG 中评分者间信度最高的类别为“安全性”(α=0.78)和“隐私”(α=0.73)。此外,另外两个类别“可用性”和“安全性”也非常接近合规性(医疗保健专业人员:α=0.62 和 0.61)。MAG 的总评分者间信度(即所有类别)为 0.45,而 MARS 的总评分者间信度为 0.29。
本研究表明,MAG 的一些类别具有显著的评分者间信度。重要的是,数据表明 MAG 的评分优于 MARS,MARS 是该领域最常用的指南。然而,回应存在很大的差异,这似乎与评审员的主观解释有关。