Riedl Bill, Than Nhan, Hogarth Michael
Pathology Informatics Core, UC Davis Dept. of Pathology and Laboratory Medicine, UC Davis School of Medicine, Davis, CA.
AMIA Annu Symp Proc. 2010 Nov 13;2010:677-81.
Cause of death data is an invaluable resource for shaping our understanding of population health. Mortality statistics is one of the principal sources of health information and in many countries the most reliable source of health data. 1 A quick classification process for this data can significantly improve public health efforts. Currently, cause of death data is captured in unstructured form requiring months to process. We think this process can be automated, at least partially, using simple statistical Natural Language Processing, NLP, techniques and the Unified Medical Language System, UMLS, as a vocabulary resource. A system, Medical Match Master, MMM, was built to exercise this theory. We evaluate this simple NLP approach in the classification of causes of death. This technique performed well if we engaged the use of a large biomedical vocabulary and applied certain syntactic maneuvers made possible by textual relationships within the vocabulary.
死因数据是塑造我们对人群健康理解的宝贵资源。死亡率统计是健康信息的主要来源之一,在许多国家也是最可靠的健康数据来源。1 对这些数据进行快速分类的过程可以显著改善公共卫生工作。目前,死因数据以非结构化形式记录,处理需要数月时间。我们认为这个过程可以至少部分地通过使用简单的统计自然语言处理(NLP)技术和统一医学语言系统(UMLS)作为词汇资源来实现自动化。构建了一个名为医学匹配大师(MMM)的系统来实践这一理论。我们在死因分类中评估这种简单的NLP方法。如果我们使用大量生物医学词汇并应用词汇内文本关系所允许的某些句法操作,这种技术表现良好。