• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于语义网开发医疗保健数据集信息资源(DIR)。

Developing a healthcare dataset information resource (DIR) based on Semantic Web.

作者信息

Shi Jingyi, Zheng Mingna, Yao Lixia, Ge Yaorong

机构信息

Department of Software and Information Systems, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223, NC, USA.

Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, 55905, MN, USA.

出版信息

BMC Med Genomics. 2018 Nov 20;11(Suppl 5):102. doi: 10.1186/s12920-018-0411-5.

DOI:10.1186/s12920-018-0411-5
PMID:30453940
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6245488/
Abstract

BACKGROUND

The right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, the lack of an information resource that focuses on specific needs of target users of datasets has existed as a problem for years. To address this gap, we have developed a Dataset Information Resource (DIR), using a user-oriented approach, which gathers relevant dataset knowledge for specific user types. In the present version, we specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets in healthcare. We emphasize that the DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses.

METHODS

The DIR leverages Semantic Web technologies and the W3C Dataset Description Profile as the standard for knowledge integration and representation. To extract tailored knowledge for target users, we have developed methods for manual extractions from dataset documentations as well as semi-automatic extractions from related publications, using natural language processing (NLP)-based approaches. A semantic query component is available for knowledge retrieval, and a parameterized question-answering functionality is provided to facilitate the ease of search.

RESULTS

The DIR prototype is composed of four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. The current implementation includes information on 12 commonly used large and complex healthcare datasets. The initial usage evaluation based on health informatics novices indicates that the DIR is helpful and beginner-friendly.

CONCLUSIONS

We have developed a novel user-oriented DIR that provides dataset knowledge specialized for target user groups. Knowledge about datasets is effectively represented in the Semantic Web. At this initial stage, the DIR has already been able to provide sophisticated and relevant knowledge of 12 datasets to help entry health informacians learn healthcare data analysis using suitable datasets. Further development of both content and function levels is underway.

摘要

背景

正确的数据集对于在数据科学中获得正确的见解至关重要;因此,数据科学家深入了解相关数据集的可用性以及这些数据集的内容、结构和现有分析非常重要。尽管正在进行多项努力来整合大量且多样的数据集,但多年来一直存在缺乏专注于数据集目标用户特定需求的信息资源这一问题。为了填补这一空白,我们采用以用户为导向的方法开发了一个数据集信息资源(DIR),它为特定用户类型收集相关的数据集知识。在当前版本中,我们专门解决初级数据科学家在学习识别、理解和分析医疗保健领域主要数据集方面所面临的挑战。我们强调,DIR不包含来自数据集的实际数据,而是旨在提供有关数据集及其分析的全面知识。

方法

DIR利用语义网技术和W3C数据集描述概要作为知识整合和表示的标准。为了为目标用户提取定制知识,我们开发了从数据集文档中进行手动提取以及使用基于自然语言处理(NLP)的方法从相关出版物中进行半自动提取的方法。提供了一个语义查询组件用于知识检索,并提供了参数化的问答功能以方便搜索。

结果

DIR原型由四个主要组件组成——数据集元数据及相关知识、搜索模块、常见问题解答和博客。当前实现包括有关12个常用的大型复杂医疗保健数据集的信息。基于健康信息学新手的初步使用评估表明,DIR很有帮助且对初学者友好。

结论

我们开发了一种新颖的以用户为导向的DIR,它提供针对目标用户群体的数据集知识。关于数据集的知识在语义网中得到了有效表示。在这个初始阶段,DIR已经能够提供12个数据集的复杂且相关的知识,以帮助入门级健康信息学人员使用合适的数据集学习医疗保健数据分析。内容和功能层面的进一步开发正在进行中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/7ae7cfcc31f5/12920_2018_411_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/065e5bc499ad/12920_2018_411_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/c6715bff99ef/12920_2018_411_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/42ecd22751d1/12920_2018_411_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/e719604e6841/12920_2018_411_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/ba54aaaa361e/12920_2018_411_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/7ae7cfcc31f5/12920_2018_411_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/065e5bc499ad/12920_2018_411_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/c6715bff99ef/12920_2018_411_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/42ecd22751d1/12920_2018_411_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/e719604e6841/12920_2018_411_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/ba54aaaa361e/12920_2018_411_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455a/6245488/7ae7cfcc31f5/12920_2018_411_Fig6_HTML.jpg

相似文献

1
Developing a healthcare dataset information resource (DIR) based on Semantic Web.基于语义网开发医疗保健数据集信息资源(DIR)。
BMC Med Genomics. 2018 Nov 20;11(Suppl 5):102. doi: 10.1186/s12920-018-0411-5.
2
PREDOSE: a semantic web platform for drug abuse epidemiology using social media.前置:一个利用社交媒体进行药物滥用流行病学研究的语义网平台。
J Biomed Inform. 2013 Dec;46(6):985-97. doi: 10.1016/j.jbi.2013.07.007. Epub 2013 Jul 25.
3
Querying phenotype-genotype relationships on patient datasets using semantic web technology: the example of Cerebrotendinous xanthomatosis.使用语义网技术在患者数据集上查询表型-基因型关系:以脑腱黄瘤病为例。
BMC Med Inform Decis Mak. 2012 Jul 31;12:78. doi: 10.1186/1472-6947-12-78.
4
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.
5
Design and development of a linked open data-based health information representation and visualization system: potentials and preliminary evaluation.基于链接开放数据的健康信息表示和可视化系统的设计与开发:潜力与初步评估。
JMIR Med Inform. 2014 Oct 25;2(2):e31. doi: 10.2196/medinform.3531.
6
A journey to Semantic Web query federation in the life sciences.生命科学中的语义网查询联邦之旅。
BMC Bioinformatics. 2009 Oct 1;10 Suppl 10(Suppl 10):S10. doi: 10.1186/1471-2105-10-S10-S10.
7
Towards Semantic e-Science for Traditional Chinese Medicine.迈向中医药语义电子科学
BMC Bioinformatics. 2007 May 9;8 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-8-S3-S6.
8
A semantic proteomics dashboard (SemPoD) for data management in translational research.用于转化研究数据管理的语义蛋白质组学仪表板(SemPoD)。
BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S20. doi: 10.1186/1752-0509-6-S3-S20. Epub 2012 Dec 17.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
linkedISA: semantic representation of ISA-Tab experimental metadata.linkedISA:ISA-Tab 实验元数据的语义表示。
BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S4. doi: 10.1186/1471-2105-15-S14-S4. Epub 2014 Nov 27.

引用本文的文献

1
Semantic Web in Healthcare: A Systematic Literature Review of Application, Research Gap, and Future Research Avenues.语义 Web 在医疗保健中的应用:应用、研究差距和未来研究方向的系统文献综述。
Int J Clin Pract. 2022 Oct 18;2022:6807484. doi: 10.1155/2022/6807484. eCollection 2022.
2
Mapping progress in intravascular catheter quality surveillance: An Australian case study of electronic medical record data linkage.血管内导管质量监测进展的映射:澳大利亚电子病历数据链接的案例研究。
Front Med (Lausanne). 2022 Aug 11;9:962130. doi: 10.3389/fmed.2022.962130. eCollection 2022.

本文引用的文献

1
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments.CEDAR工作台:一个用于创作描述科学实验的元数据的本体辅助环境。
Semant Web ISWC. 2017 Oct;10588:103-110. doi: 10.1007/978-3-319-68204-4_10. Epub 2017 Oct 4.
2
CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.CLAMP - 一个用于高效构建定制化临床自然语言处理管道的工具包。
J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336. doi: 10.1093/jamia/ocx132.
3
Finding useful data across multiple biomedical data repositories using DataMed.
利用 DataMed 在多个生物医学数据存储库中查找有用数据。
Nat Genet. 2017 May 26;49(6):816-819. doi: 10.1038/ng.3864.
4
The health care and life sciences community profile for dataset descriptions.数据集描述的医疗保健和生命科学领域概况。
PeerJ. 2016 Aug 16;4:e2331. doi: 10.7717/peerj.2331. eCollection 2016.
5
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
6
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.
7
The center for expanded data annotation and retrieval.扩展数据注释与检索中心
J Am Med Inform Assoc. 2015 Nov;22(6):1148-52. doi: 10.1093/jamia/ocv048. Epub 2015 Jun 25.
8
PAV ontology: provenance, authoring and versioning.PAV本体:来源、创作与版本控制。
J Biomed Semantics. 2013 Nov 22;4(1):37. doi: 10.1186/2041-1480-4-37.
9
NeuroLex.org: an online framework for neuroscience knowledge.NeuroLex.org:一个在线神经科学知识库框架。
Front Neuroinform. 2013 Aug 30;7:18. doi: 10.3389/fninf.2013.00018. eCollection 2013.
10
Conducting high-value secondary dataset analysis: an introductory guide and resources.开展高价值二次数据集分析:入门指南和资源。
J Gen Intern Med. 2011 Aug;26(8):920-9. doi: 10.1007/s11606-010-1621-5. Epub 2011 Feb 8.