Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA.
BMC Med Inform Decis Mak. 2012 May 8;12:37. doi: 10.1186/1472-6947-12-37.
Death records are a rich source of data, which can be used to assist with public surveillance and/or decision support. However, to use this type of data for such purposes it has to be transformed into a coded format to make it computable. Because the cause of death in the certificates is reported as free text, encoding the data is currently the single largest barrier of using death certificates for surveillance. Therefore, the purpose of this study was to demonstrate the feasibility of using a pipeline, composed of a detection rule and a natural language processor, for the real time encoding of death certificates using the identification of pneumonia and influenza cases as an example and demonstrating that its accuracy is comparable to existing methods.
A Death Certificates Pipeline (DCP) was developed to automatically code death certificates and identify pneumonia and influenza cases. The pipeline used MetaMap to code death certificates from the Utah Department of Health for the year 2008. The output of MetaMap was then accessed by detection rules which flagged pneumonia and influenza cases based on the Centers of Disease and Control and Prevention (CDC) case definition. The output from the DCP was compared with the current method used by the CDC and with a keyword search. Recall, precision, positive predictive value and F-measure with respect to the CDC method were calculated for the two other methods considered here. The two different techniques compared here with the CDC method showed the following recall/ precision results: DCP: 0.998/0.98 and keyword searching: 0.96/0.96. The F-measure were 0.99 and 0.96 respectively (DCP and keyword searching). Both the keyword and the DCP can run in interactive form with modest computer resources, but DCP showed superior performance.
The pipeline proposed here for coding death certificates and the detection of cases is feasible and can be extended to other conditions. This method provides an alternative that allows for coding free-text death certificates in real time that may increase its utilization not only in the public health domain but also for biomedical researchers and developers.
This study did not involved any clinical trials.
死亡记录是一种丰富的数据来源,可用于协助公共监测和/或决策支持。然而,要将此类数据用于此类目的,必须将其转换为编码格式,使其可计算。由于证书中的死亡原因以自由文本报告,因此对数据进行编码是目前使用死亡证书进行监测的最大障碍。因此,本研究的目的是展示使用管道(由检测规则和自然语言处理器组成)实时对死亡证书进行编码的可行性,以肺炎和流感病例为例,并证明其准确性可与现有方法相媲美。
开发了一个死亡证明管道 (DCP) 来自动对死亡证明进行编码并识别肺炎和流感病例。该管道使用 MetaMap 对 2008 年犹他州卫生部的死亡证明进行编码。然后,检测规则访问 MetaMap 的输出,根据疾病控制与预防中心 (CDC) 的病例定义标记肺炎和流感病例。DCP 的输出与 CDC 当前使用的方法和关键字搜索进行了比较。针对这里考虑的另外两种方法,计算了相对于 CDC 方法的召回率、精度、阳性预测值和 F 度量。与 CDC 方法相比,这里比较的两种不同技术的结果如下:DCP:0.998/0.98 和关键字搜索:0.96/0.96。F 度量分别为 0.99 和 0.96(DCP 和关键字搜索)。关键字和 DCP 都可以在适度的计算机资源下以交互形式运行,但 DCP 表现出更好的性能。
本文提出的编码死亡证明和检测病例的管道是可行的,并可扩展到其他情况。这种方法提供了一种替代方法,可实时对自由文本死亡证明进行编码,不仅可以增加其在公共卫生领域的利用率,还可以为生物医学研究人员和开发人员提供便利。
本研究不涉及任何临床试验。