Chazard Emmanuel, Ficheur Grégoire, Caron Alexandre, Lamer Antoine, Labreuche Julien, Cuggia Marc, Genin Michaël, Bouzille Guillaume, Duhamel Alain
CERIM EA2694, Lille University, F-59000 Lille.
Public Health Department, CHU Lille, F-59000 Lille.
Stud Health Technol Inform. 2018;255:15-19.
Secondary use of clinical structured data takes an important place in healthcare research. It was first described by Fayyad as "knowledge discovery in databases". Feature extraction is an important phase but received little attention. The objectives of this paper are: 1) to propose an updated representation of data reuse in healthcare, 2) to illustrate methods and objectives of feature extraction, and 3) to discuss the place of domain-specific knowledge.
an updated representation is proposed. Then, a case study consists of automatically identifying acute renal failure and discovering risk factors, by secondary use of structured data. Finally, a literature review published par Meystre et al. is analyzed.
features extraction has a major impact on success of data reuse. Specific knowledge-based reasoning takes an important place in feature extraction, which requires tight collaboration between computer scientists, statisticians, and health professionals.
临床结构化数据的二次利用在医疗保健研究中占有重要地位。它最早由法亚德描述为“数据库中的知识发现”。特征提取是一个重要阶段,但却很少受到关注。本文的目标是:1)提出医疗保健中数据重用的更新表示;2)说明特征提取的方法和目标;3)讨论特定领域知识的作用。
提出了一种更新的表示方法。然后,通过结构化数据的二次利用进行一个案例研究,包括自动识别急性肾衰竭并发现风险因素。最后,分析了梅斯特雷等人发表的一篇文献综述。
1)我们提出了数据重用的五个阶段描述。第一阶段是数据预处理(清理、链接、术语对齐、单位转换、去识别),它能够构建一个数据仓库。第二阶段是特征提取。第三阶段是统计和图形挖掘。第四阶段包括专家对统计结果的筛选和重组。第五阶段是决策。2)案例研究说明了如何利用特定领域知识从实验室结果和药物管理中提取时间相关特征。3)在梅斯特雷等人引用的200篇论文中,第一作者和最后作者隶属于卫生机构的比例为74%(方法学论文为68%,应用论文为79%)。
特征提取对数据重用的成功有重大影响。基于特定知识的推理在特征提取中占有重要地位,这需要计算机科学家、统计学家和卫生专业人员之间的紧密合作。