Fong A, Ratwani R
Allan Fong, MS, MedStar Institute for Innovation - National Center for Human Factors in Healthcare, 3007 Tilden St. NW, Suite 7M, Washington, D.C. 20008, USA, E-mail:
Methods Inf Med. 2015;54(4):338-45. doi: 10.3414/ME15-01-0010. Epub 2015 Apr 2.
Patient safety event data repositories have the potential to dramatically improve safety if analyzed and leveraged appropriately. These safety event reports often consist of both structured data, such as general event type categories, and unstructured data, such as free text descriptions of the event. Analyzing these data, particularly the rich free text narratives, can be challenging, especially with tens of thousands of reports. To overcome the resource intensive manual review process of the free text descriptions, we demonstrate the effectiveness of using an unsupervised natural language processing approach.
An unsupervised natural language processing technique, called topic modeling, was applied to a large repository of patient safety event data to identify topics, or themes, from the free text descriptions of the data. Entropy measures were used to evaluate and compare these topics to the general event type categories that were originally assigned by the event reporter.
Measures of entropy demonstrated that some topics generated from the unsupervised modeling approach aligned with the clinical general event type categories that were originally selected by the individual entering the report. Importantly, several new latent topics emerged that were not originally identified. The new topics provide additional insights into the patient safety event data that would not otherwise easily be detected.
The topic modeling approach provides a method to identify topics or themes that may not be immediately apparent and has the potential to allow for automatic reclassification of events that are ambiguously classified by the event re- porter.
如果对患者安全事件数据存储库进行适当的分析和利用,它们有可能显著提高安全性。这些安全事件报告通常既包含结构化数据,如一般事件类型类别,也包含非结构化数据,如事件的自由文本描述。分析这些数据,尤其是丰富的自由文本叙述,可能具有挑战性,特别是对于数以万计的报告。为了克服对自由文本描述进行资源密集型人工审查的过程,我们展示了使用无监督自然语言处理方法的有效性。
一种称为主题建模的无监督自然语言处理技术被应用于一个大型患者安全事件数据存储库,以从数据的自由文本描述中识别主题。熵度量被用于评估这些主题并将其与事件报告者最初指定的一般事件类型类别进行比较。
熵度量表明,从无监督建模方法生成的一些主题与输入报告的个人最初选择的临床一般事件类型类别一致。重要的是,出现了几个最初未被识别的新潜在主题。这些新主题为患者安全事件数据提供了额外的见解,否则这些见解不容易被发现。
主题建模方法提供了一种识别可能不立即明显的主题或主题的方法,并且有可能对事件报告者分类模糊的事件进行自动重新分类。