Alicante Anita, Amato Flora, Cozzolino Giovanni, Gargiulo Francesco, Improda Nicla, Mazzeo Antonino
Department of Electrical Engineering and Technology Information (DIETI), University of Naples Federico II.
Stud Health Technol Inform. 2014;207:370-9.
Healthcare domain is characterized by a huge amount of data, contained in medical records, reports, test results and so on. In order to give support to healthcare workers and manage relevant data in effective and efficient way, it is important to correctly classify the unstructured parts of text, embedded in the medical documents. In this paper, we propose a classification system for medical records categorization, focused on the combination of different methodologies, based on lexical, syntactical and semantic analysis of the documents. We will show that a Classification System based on a combination of different text analysis methodologies overcomes the performances of each methodology taken alone. The obtained results will be presented in terms of Accuracy-Rejection Curves. Eventually, pro and cons of the architecture proposed and some future work will be pointed out.
医疗保健领域的特点是存在大量数据,这些数据包含在病历、报告、检测结果等之中。为了支持医护人员并以有效且高效的方式管理相关数据,正确分类嵌入在医疗文档中的文本非结构化部分非常重要。在本文中,我们提出了一种用于病历分类的系统,该系统专注于基于文档的词汇、句法和语义分析的不同方法的组合。我们将表明,基于不同文本分析方法组合的分类系统克服了单独采用每种方法的性能。所获得的结果将以准确率 - 拒绝率曲线的形式呈现。最后,将指出所提出架构的优缺点以及一些未来的工作。