Suppr超能文献

医疗结构化数据的二次利用:基于领域知识的特征提取挑战

Secondary Use of Healthcare Structured Data: The Challenge of Domain-Knowledge Based Extraction of Features.

作者信息

Chazard Emmanuel, Ficheur Grégoire, Caron Alexandre, Lamer Antoine, Labreuche Julien, Cuggia Marc, Genin Michaël, Bouzille Guillaume, Duhamel Alain

机构信息

CERIM EA2694, Lille University, F-59000 Lille.

Public Health Department, CHU Lille, F-59000 Lille.

出版信息

Stud Health Technol Inform. 2018;255:15-19.

Abstract

UNLABELLED

Secondary use of clinical structured data takes an important place in healthcare research. It was first described by Fayyad as "knowledge discovery in databases". Feature extraction is an important phase but received little attention. The objectives of this paper are: 1) to propose an updated representation of data reuse in healthcare, 2) to illustrate methods and objectives of feature extraction, and 3) to discuss the place of domain-specific knowledge.

MATERIAL AND METHODS

an updated representation is proposed. Then, a case study consists of automatically identifying acute renal failure and discovering risk factors, by secondary use of structured data. Finally, a literature review published par Meystre et al. is analyzed.

RESULTS

  1. we propose a description of data reuse in 5 phases. Phase 1 is data preprocessing (cleansing, linkage, terminological alignment, unit conversions, deidentification), it enables to construct a data warehouse. Phase 2 is feature extraction. Phase 3 is statistical and graphical mining. Phase 4 consists of expert filtering and reorganization of statistical results. Phase 5 is decision making. 2) The case study illustrates how time-dependent features can be extracted from laboratory results and drug administrations, using domain-specific knowledge. 3) Among the 200 papers cited by Meystre et al., the first and last authors were affiliated to health institutions in 74% (68% for methodological papers, and 79% for applied papers).

DISCUSSION

features extraction has a major impact on success of data reuse. Specific knowledge-based reasoning takes an important place in feature extraction, which requires tight collaboration between computer scientists, statisticians, and health professionals.

摘要

未标注

临床结构化数据的二次利用在医疗保健研究中占有重要地位。它最早由法亚德描述为“数据库中的知识发现”。特征提取是一个重要阶段,但却很少受到关注。本文的目标是:1)提出医疗保健中数据重用的更新表示;2)说明特征提取的方法和目标;3)讨论特定领域知识的作用。

材料与方法

提出了一种更新的表示方法。然后,通过结构化数据的二次利用进行一个案例研究,包括自动识别急性肾衰竭并发现风险因素。最后,分析了梅斯特雷等人发表的一篇文献综述。

结果

1)我们提出了数据重用的五个阶段描述。第一阶段是数据预处理(清理、链接、术语对齐、单位转换、去识别),它能够构建一个数据仓库。第二阶段是特征提取。第三阶段是统计和图形挖掘。第四阶段包括专家对统计结果的筛选和重组。第五阶段是决策。2)案例研究说明了如何利用特定领域知识从实验室结果和药物管理中提取时间相关特征。3)在梅斯特雷等人引用的200篇论文中,第一作者和最后作者隶属于卫生机构的比例为74%(方法学论文为68%,应用论文为79%)。

讨论

特征提取对数据重用的成功有重大影响。基于特定知识的推理在特征提取中占有重要地位,这需要计算机科学家、统计学家和卫生专业人员之间的紧密合作。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验