Almowil Zahra, Zhou Shang-Ming, Brophy Sinead, Croxall Jodie
Data Science Building, Medical School, Swansea University, Swansea, Wales, United Kingdom.
Centre For Health Technology, Faculty of Health, University of Plymouth, Plymouth, United Kingdom.
JMIR Hum Factors. 2022 Mar 15;9(1):e31021. doi: 10.2196/31021.
Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it difficult to compare different study findings and hinders the ability to conduct repeatable and reusable research.
This study aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, in the development of a data portal for phenotypes (a concept library).
This was a qualitative study using interviews and focus group discussion. One-to-one interviews were conducted with researchers, clinicians, machine learning experts, and senior research managers in health data science (N=6) to explore their specific needs in the development of a concept library. In addition, a focus group discussion with researchers (N=14) working with the Secured Anonymized Information Linkage databank, a national eHealth data linkage infrastructure, was held to perform a SWOT (strengths, weaknesses, opportunities, and threats) analysis for the phenotyping system and the proposed concept library. The interviews and focus group discussion were transcribed verbatim, and 2 thematic analyses were performed.
Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would stimulate them to share their work and reuse the work of others, and they pointed out several barriers that could inhibit them from sharing their work and reusing the work of others. The participants suggested some developments that they would like to see to improve reproducible research output using routine data.
The study indicated that most interviewees valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform. Analysis of interviews and the focus group discussion revealed that different stakeholders have different requirements, facilitators, barriers, and concerns about a prototype concept library.
健康科学领域的大数据研究因在如何识别和定义不同病症及其药物方面缺乏共识而受阻。这意味着研究人员和健康专业人员对同一病症往往有不同的表型定义。这种缺乏共识使得难以比较不同的研究结果,并阻碍了进行可重复和可复用研究的能力。
本研究旨在考察各类用户,如研究人员、临床医生、机器学习专家和管理人员,在开发用于表型的数据门户(一个概念库)方面的需求。
这是一项采用访谈和焦点小组讨论的定性研究。对健康数据科学领域的研究人员、临床医生、机器学习专家和高级研究经理(N = 6)进行了一对一访谈,以探讨他们在概念库开发中的具体需求。此外,与使用安全匿名信息链接数据库(一个国家电子健康数据链接基础设施)的研究人员(N = 14)进行了焦点小组讨论,以对表型系统和拟议的概念库进行SWOT(优势、劣势、机会和威胁)分析。访谈和焦点小组讨论逐字记录,并进行了两次主题分析。
大多数参与者认为原型概念库对于进行可重复研究将是一个非常有用的资源,但他们指出在其开发之前需要满足许多要求。尽管所有参与者都表示他们知道一些现有的概念库,但他们中的大多数对这些概念库表达了负面看法。参与者提到了几个会促使他们分享自己的工作并复用他人工作的促进因素,他们还指出了几个可能阻碍他们分享自己的工作并复用他人工作的障碍。参与者提出了一些他们希望看到的改进,以利用常规数据提高可重复研究产出。
该研究表明,大多数受访者重视用于表型的概念库。然而,只有一半的参与者认为他们会通过为概念库提供定义来做出贡献,并且他们报告了在公开可用平台上分享其工作存在许多障碍。对访谈和焦点小组讨论的分析表明,不同的利益相关者对原型概念库有不同的需求、促进因素、障碍和担忧。