Alicante Anita, Corazza Anna, Isgrò Francesco, Silvestri Stefano
Dipartimento di Ingegneria Elettrica e delle Tecnologie dell'Informazione, Università di Napoli Federico II, Italy.
Stud Health Technol Inform. 2014;207:340-9.
This paper discusses the application of an unsupervised text mining technique for the extraction of information from clinical records in Italian. The approach includes two steps. First of all, a metathesaurus is exploited together with natural language processing tools to extract the domain entities. Then, clustering is applied to explore relations between entity pairs. The results of a preliminary experiment, performed on the text extracted from 57 medical records containing more than 20,000 potential relations, show how the clustering should be based on the cosine similarity distance rather than the City Block or Hamming ones.
本文讨论了一种无监督文本挖掘技术在从意大利语临床记录中提取信息方面的应用。该方法包括两个步骤。首先,利用一个元词库以及自然语言处理工具来提取领域实体。然后,应用聚类来探索实体对之间的关系。对从57份病历中提取的文本进行的初步实验结果显示,其中包含超过20000个潜在关系,该结果表明聚类应基于余弦相似度距离,而非街区距离或汉明距离。