• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

应用人工智能支持异质流行病学数据集的分类

Applying AI to Support Categorization of Heterogeneous Epidemiological Datasets.

作者信息

Sasse Julia, Fabre Guillaume, Fortier Isabel, Zimmermann Pierre, Fluck Juliane

机构信息

ZB MED - Information Centre for Life Sciences, Cologne, Germany, https://ror.org/0259fwx54.

Maelstrom Research, Research Institute of the McGill University Health Centre, Montreal, Canada.

出版信息

Stud Health Technol Inform. 2025 May 15;327:848-852. doi: 10.3233/SHTI250479.

DOI:10.3233/SHTI250479
PMID:40380587
Abstract

The significance of Findable, Accessible, Interoperable, and Reusable (FAIR) data is increasing, particularly in the context of enhancing data reuse in research. The National Research Data Infrastructure for Personal Health Data (NFDI4Health) aims to enhance the findability, reusability, and interoperability of health data derived from epidemiological, clinical, and public health studies. NFDI4Health has established the German Central Health Study Hub to improve health data findability through rich metadata. The Maelstrom Catalog, provided by Maelstrom Research, offers a comprehensive dataset of labeled and harmonized study variables, thereby enhancing the findability and reusability of epidemiological data. Both platforms rely on standardized categorization to optimize data reuse. To facilitate this process, NFDI4Health developed the Metadata Annotation Workbench, which supports metadata annotation with standardized vocabulary. This paper presents an AI solution for automatic classification and annotation integrated into this service, using a BioBERT-based text classifier. The model achieved a weighted F1-score of over 92% and demonstrated improved annotation performance, particularly for non-experts. It accelerates variable categorization, thereby enhancing data findability and re-use. As a result, the categorization of study variables can be accelerated and we are confident that the further development of such AI approaches will reduce curatorial workload and promote semantically annotated interoperable data catalogs.

摘要

可查找、可访问、可互操作和可重用(FAIR)数据的重要性日益凸显,尤其是在加强研究中的数据重用方面。国家个人健康数据研究数据基础设施(NFDI4Health)旨在提高源自流行病学、临床和公共卫生研究的健康数据的可查找性、可重用性和互操作性。NFDI4Health已建立德国中央健康研究中心,通过丰富的元数据提高健康数据的可查找性。Maelstrom Research提供的Maelstrom Catalog提供了一个包含标记和统一研究变量的综合数据集,从而提高了流行病学数据的可查找性和可重用性。这两个平台都依赖标准化分类来优化数据重用。为了促进这一过程,NFDI4Health开发了元数据注释工作台,该工作台支持使用标准化词汇进行元数据注释。本文介绍了一种集成到该服务中的用于自动分类和注释的人工智能解决方案,使用基于BioBERT的文本分类器。该模型的加权F1分数超过92%,并展示了改进的注释性能,尤其是对于非专家而言。它加速了变量分类,从而提高了数据的可查找性和再利用。因此,可以加速研究变量的分类,并且我们相信这种人工智能方法的进一步发展将减少管理工作量并促进语义注释的可互操作数据目录。

相似文献

1
Applying AI to Support Categorization of Heterogeneous Epidemiological Datasets.应用人工智能支持异质流行病学数据集的分类
Stud Health Technol Inform. 2025 May 15;327:848-852. doi: 10.3233/SHTI250479.
2
Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.将元数据转化为机器可读形式作为提供可查找、可访问、可互操作和可重用的人群健康数据的第一步:框架开发与实施研究
Online J Public Health Inform. 2024 Aug 1;16:e56237. doi: 10.2196/56237.
3
Chronic disease outcome metadata from German observational studies - public availability and FAIR principles.德国观察性研究的慢性病结局元数据 - 公开可用性和 FAIR 原则。
Sci Data. 2023 Dec 5;10(1):868. doi: 10.1038/s41597-023-02726-7.
4
[Making COVID-19 research data more accessible-building a nationwide information infrastructure].让新冠病毒研究数据更易获取——构建全国性信息基础设施
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2021 Sep;64(9):1084-1092. doi: 10.1007/s00103-021-03386-x. Epub 2021 Jul 23.
5
Towards an Interoperability Landscape for a National Research Data Infrastructure for Personal Health Data.迈向个人健康数据国家研究数据基础设施的互操作性蓝图。
Sci Data. 2024 Jul 13;11(1):772. doi: 10.1038/s41597-024-03615-3.
6
Developing a standardized but extendable framework to increase the findability of infectious disease datasets.开发一个标准化但可扩展的框架,以提高传染病数据集的可发现性。
Sci Data. 2023 Feb 23;10(1):99. doi: 10.1038/s41597-023-01968-9.
7
NFDI4Health Local Data Hubs Implementing a Tailored Metadata Schema for Health Data.NFDI4Health 地方数据中心为健康数据实施定制化元数据方案。
Stud Health Technol Inform. 2024 Aug 30;317:115-122. doi: 10.3233/SHTI240845.
8
An Annotation Workbench for Semantic Annotation of Data Collection Instruments.用于数据收集工具语义标注的标注工作台。
Stud Health Technol Inform. 2023 May 18;302:108-112. doi: 10.3233/SHTI230074.
9
From Raw Data to FAIR Data: The FAIRification Workflow for Health Research.从原始数据到 FAIR 数据:健康研究的 FAIR 化工作流程。
Methods Inf Med. 2020 Jun;59(S 01):e21-e32. doi: 10.1055/s-0040-1713684. Epub 2020 Jul 3.
10
HL7 FHIR in Health Research: A FHIR Specification for Metadata in Clinical, Epidemiological, and Public Health Studies.HL7 FHIR 在健康研究中的应用:临床、流行病学和公共卫生研究中的元数据 FHIR 规范。
Stud Health Technol Inform. 2024 Aug 22;316:1960-1961. doi: 10.3233/SHTI240817.