Sedlakova Jana, Daniore Paola, Horn Wintsch Andrea, Wolf Markus, Stanikic Mina, Haag Christina, Sieber Chloé, Schneider Gerold, Staub Kaspar, Alois Ettlin Dominik, Grübner Oliver, Rinaldi Fabio, von Wyl Viktor
Digital Society Initiative, University of Zurich, Zurich, Switzerland.
Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland.
PLOS Digit Health. 2023 Oct 11;2(10):e0000347. doi: 10.1371/journal.pdig.0000347. eCollection 2023 Oct.
Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.
数字数据在推进健康研究与医疗保健方面发挥着越来越重要的作用。然而,医疗保健领域的大多数数字数据都是非结构化的,通常难以直接用于研究。非结构化数据往往以缺乏标准化的格式存在,需要大量的预处理和特征提取工作。当将此类数据与其他数据源结合以增强现有知识库(我们称之为数字非结构化数据丰富化)时,这会带来挑战。克服这些方法上的挑战需要大量资源,并且可能会限制充分利用其推进健康研究以及最终预防和患者护理服务的潜力。虽然健康研究中与非结构化数据使用相关的普遍挑战在文献中广泛报道,但缺少对此类挑战以及促进其与结构化数据源结合使用的可能解决方案的全面跨学科总结。在本研究中,我们报告了一项系统性叙述性综述的结果,该综述涉及心脏病学、神经病学和心理健康领域中与数字非结构化数据丰富化相关的七个最普遍的挑战领域,以及应对这些挑战的可能解决方案。基于这些发现,我们制定了一份遵循健康研究标准数据流的清单。该清单旨在为旨在将非结构化数据与现有数据源相结合的健康研究的早期规划和可行性评估提供初步的系统指导。总体而言,本综述中纳入的研究中报告的非结构化数据丰富化方法的一般性要求对此类方法进行更系统的报告,以便在未来研究中实现更高的可重复性。