Zhu Huijia, Ni Yuan, Cai Peng, Qiu Zhaoming, Cao Feng
IBM China Research Lab, Shanghai, People's Republic of China.
Stud Health Technol Inform. 2012;180:589-93.
In the Evidence-based Medicine (EBM), PICO format is designed to easily and correctly search for the best available evidence. As the main element of PICO, the Patient/Problem (P) represents the attributes of patient in the clinical question and studies. In order to better understand the clinical problems, patient attribute identification is crucial and indispensable. Due to the richness of the human nature language, many issues like various term representations, grammar structures and abbreviations present challenges for automatically extracting the patient-related attributes from the unstructured data. In this paper, we employed the nature language processing (NLP) technologies to deeply analyze the linguistic characteristics of the attributes. Based on the NLP analysis results, we built the rule sets for different attributes and applied the rule-based approach to extract the patient-related attributes.
在循证医学(EBM)中,PICO格式旨在方便、正确地检索最佳现有证据。作为PICO的主要元素,患者/问题(P)代表临床问题和研究中患者的属性。为了更好地理解临床问题,患者属性识别至关重要且不可或缺。由于人类自然语言的丰富性,诸如各种术语表示、语法结构和缩写等诸多问题给从非结构化数据中自动提取患者相关属性带来了挑战。在本文中,我们运用自然语言处理(NLP)技术深入分析属性的语言特征。基于NLP分析结果,我们为不同属性构建了规则集,并应用基于规则的方法来提取患者相关属性。