• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

健康研究中数字非结构化数据充实的挑战与最佳实践:一项系统性叙述性综述

Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review.

作者信息

Sedlakova Jana, Daniore Paola, Horn Wintsch Andrea, Wolf Markus, Stanikic Mina, Haag Christina, Sieber Chloé, Schneider Gerold, Staub Kaspar, Alois Ettlin Dominik, Grübner Oliver, Rinaldi Fabio, von Wyl Viktor

机构信息

Digital Society Initiative, University of Zurich, Zurich, Switzerland.

Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland.

出版信息

PLOS Digit Health. 2023 Oct 11;2(10):e0000347. doi: 10.1371/journal.pdig.0000347. eCollection 2023 Oct.

DOI:10.1371/journal.pdig.0000347
PMID:37819910
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10566734/
Abstract

Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.

摘要

数字数据在推进健康研究与医疗保健方面发挥着越来越重要的作用。然而,医疗保健领域的大多数数字数据都是非结构化的,通常难以直接用于研究。非结构化数据往往以缺乏标准化的格式存在,需要大量的预处理和特征提取工作。当将此类数据与其他数据源结合以增强现有知识库(我们称之为数字非结构化数据丰富化)时,这会带来挑战。克服这些方法上的挑战需要大量资源,并且可能会限制充分利用其推进健康研究以及最终预防和患者护理服务的潜力。虽然健康研究中与非结构化数据使用相关的普遍挑战在文献中广泛报道,但缺少对此类挑战以及促进其与结构化数据源结合使用的可能解决方案的全面跨学科总结。在本研究中,我们报告了一项系统性叙述性综述的结果,该综述涉及心脏病学、神经病学和心理健康领域中与数字非结构化数据丰富化相关的七个最普遍的挑战领域,以及应对这些挑战的可能解决方案。基于这些发现,我们制定了一份遵循健康研究标准数据流的清单。该清单旨在为旨在将非结构化数据与现有数据源相结合的健康研究的早期规划和可行性评估提供初步的系统指导。总体而言,本综述中纳入的研究中报告的非结构化数据丰富化方法的一般性要求对此类方法进行更系统的报告,以便在未来研究中实现更高的可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cba/10566734/d9f974671830/pdig.0000347.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cba/10566734/d9f974671830/pdig.0000347.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cba/10566734/d9f974671830/pdig.0000347.g001.jpg

相似文献

1
Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review.健康研究中数字非结构化数据充实的挑战与最佳实践:一项系统性叙述性综述
PLOS Digit Health. 2023 Oct 11;2(10):e0000347. doi: 10.1371/journal.pdig.0000347. eCollection 2023 Oct.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
4
Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树:影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。
Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.
5
How has the impact of 'care pathway technologies' on service integration in stroke care been measured and what is the strength of the evidence to support their effectiveness in this respect?“护理路径技术”对卒中护理服务整合的影响是如何衡量的,以及有哪些证据支持其在这方面的有效性?
Int J Evid Based Healthc. 2008 Mar;6(1):78-110. doi: 10.1111/j.1744-1609.2007.00098.x.
6
The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review.综合护理路径在医疗环境中对成人和儿童的有效性:一项系统评价。
JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001.
7
The Experience and Effectiveness of Nurse Practitioners in Orthopaedic Settings: A Comprehensive Systematic Review.执业护士在骨科环境中的经验与成效:一项全面的系统评价
JBI Libr Syst Rev. 2012;10(42 Suppl):1-22. doi: 10.11124/jbisrir-2012-249.
8
The effectiveness of internet-based e-learning on clinician behavior and patient outcomes: a systematic review protocol.基于互联网的电子学习对临床医生行为和患者结局的有效性:一项系统评价方案。
JBI Database System Rev Implement Rep. 2015 Jan;13(1):52-64. doi: 10.11124/jbisrir-2015-1919.
9
Tuberculosis结核病
10

引用本文的文献

1
Multimodal Integration in Health Care: Development With Applications in Disease Management.医疗保健中的多模态整合:疾病管理应用中的发展
J Med Internet Res. 2025 Aug 21;27:e76557. doi: 10.2196/76557.
2
EchoLLM: extracting echocardiogram entities with light-weight, open-source large language models.EchoLLM:使用轻量级、开源大语言模型提取超声心动图实体。
JAMIA Open. 2025 Aug 13;8(4):ooaf092. doi: 10.1093/jamiaopen/ooaf092. eCollection 2025 Aug.
3
In patients' words: natural language processing of reports from patients experiencing orofacial pain and dysfunction.

本文引用的文献

1
A Proposed Approach for Conducting Studies That Use Data From Social Media Platforms.社交媒体平台数据研究的一种建议方法。
Mayo Clin Proc. 2021 Aug;96(8):2218-2229. doi: 10.1016/j.mayocp.2021.02.010.
2
Multi-Layer Picture of Neurodegenerative Diseases: Lessons from the Use of Big Data through Artificial Intelligence.神经退行性疾病的多层图景:通过人工智能使用大数据的经验教训
J Pers Med. 2021 Apr 7;11(4):280. doi: 10.3390/jpm11040280.
3
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews.PRISMA 2020 声明:系统评价报告的更新指南。
用患者的话来说:对经历口面部疼痛和功能障碍的患者报告进行自然语言处理。
J Headache Pain. 2025 Jul 30;26(1):172. doi: 10.1186/s10194-025-02095-z.
4
A survey of NLP methods for oncology in the past decade with a focus on cancer registry applications.对过去十年肿瘤学领域自然语言处理方法的一项调查,重点关注癌症登记应用。
Artif Intell Rev. 2025;58(10):314. doi: 10.1007/s10462-025-11316-5. Epub 2025 Jul 16.
5
Fine-tuning of language models for automated structuring of medical exam reports to improve patient screening and analysis.对语言模型进行微调,以实现医学检查报告的自动结构化,从而改善患者筛查与分析。
Sci Rep. 2025 Jul 4;15(1):23949. doi: 10.1038/s41598-025-05695-6.
6
Iterative refinement and goal articulation to optimize large language models for clinical information extraction.迭代优化与目标阐述以优化用于临床信息提取的大语言模型
NPJ Digit Med. 2025 May 23;8(1):301. doi: 10.1038/s41746-025-01686-z.
7
A Review of the Applications, Benefits, and Challenges of Generative AI for Sustainable Toxicology.生成式人工智能在可持续毒理学中的应用、益处及挑战综述
Curr Res Toxicol. 2025 Apr 21;8:100232. doi: 10.1016/j.crtox.2025.100232. eCollection 2025.
8
ATCodeR: a dictionary-based R-tool to standardize medication free-text.ATCodeR:一种基于字典的用于规范药物自由文本的R工具。
Sci Rep. 2025 Apr 10;15(1):12252. doi: 10.1038/s41598-025-97150-9.
9
Understanding the Policy Space for AgeTech: Implications for AI and Digital Health.了解老年科技的政策空间:对人工智能和数字健康的影响。
Public Policy Aging Rep. 2024;34(4):144-149. doi: 10.1093/ppar/prae023. Epub 2024 Nov 28.
10
Implementing Accuracy, Completeness, and Traceability for Data Reliability.实现数据可靠性的准确性、完整性和可追溯性。
JAMA Netw Open. 2025 Mar 3;8(3):e250128. doi: 10.1001/jamanetworkopen.2025.0128.
BMJ. 2021 Mar 29;372:n71. doi: 10.1136/bmj.n71.
4
Predicting Emotional States Using Behavioral Markers Derived From Passively Sensed Data: Data-Driven Machine Learning Approach.使用源自被动感知数据的行为标记物预测情绪状态:数据驱动的机器学习方法。
JMIR Mhealth Uhealth. 2021 Mar 22;9(3):e24465. doi: 10.2196/24465.
5
Predicting Cardiovascular Risk Using Social Media Data: Performance Evaluation of Machine-Learning Models.利用社交媒体数据预测心血管风险:机器学习模型的性能评估
JMIR Cardio. 2021 Feb 19;5(1):e24473. doi: 10.2196/24473.
6
Smartphones and the Neuroscience of Mental Health.智能手机与精神健康的神经科学
Annu Rev Neurosci. 2021 Jul 8;44:129-151. doi: 10.1146/annurev-neuro-101220-014053. Epub 2021 Feb 8.
7
CardioNet: a manually curated database for artificial intelligence-based research on cardiovascular diseases.CardioNet:一个用于心血管疾病人工智能研究的人工整理数据库。
BMC Med Inform Decis Mak. 2021 Jan 28;21(1):29. doi: 10.1186/s12911-021-01392-2.
8
Precompetitive Consensus Building to Facilitate the Use of Digital Health Technologies to Support Parkinson Disease Drug Development through Regulatory Science.通过监管科学开展竞争前共识构建,以促进数字健康技术在帕金森病药物研发中的应用。
Digit Biomark. 2020 Nov 26;4(Suppl 1):28-49. doi: 10.1159/000512500. eCollection 2020 Winter.
9
Unobtrusive detection of Parkinson's disease from multi-modal and in-the-wild sensor data using deep learning techniques.使用深度学习技术从多模态和真实环境传感器数据中进行无干扰的帕金森病检测。
Sci Rep. 2020 Dec 7;10(1):21370. doi: 10.1038/s41598-020-78418-8.
10
Guidelines for Data Acquisition, Quality and Curation for Observational Research Designs (DAQCORD).观察性研究设计的数据采集、质量与管理指南(DAQCORD)
J Clin Transl Sci. 2020 Mar 13;4(4):354-359. doi: 10.1017/cts.2020.24.