Vanderbilt University School of Medicine, Nashville, Tennessee, United States.
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States.
Appl Clin Inform. 2021 Jan;12(1):170-178. doi: 10.1055/s-0041-1723024. Epub 2021 Mar 10.
This study examines the validity of optical mark recognition, a novel user interface, and crowdsourced data validation to rapidly digitize and extract data from paper COVID-19 assessment forms at a large medical center.
An optical mark recognition/optical character recognition (OMR/OCR) system was developed to identify fields that were selected on 2,814 paper assessment forms, each with 141 fields which were used to assess potential COVID-19 infections. A novel user interface (UI) displayed mirrored forms showing the scanned assessment forms with OMR results superimposed on the left and an editable web form on the right to improve ease of data validation. Crowdsourced participants validated the results of the OMR system. Overall error rate and time taken to validate were calculated. A subset of forms was validated by multiple participants to calculate agreement between participants.
The OMR/OCR tools correctly extracted data from scanned forms fields with an average accuracy of 70% and median accuracy of 78% when the OMR/OCR results were compared with the results from crowd validation. Scanned forms were crowd-validated at a mean rate of 157 seconds per document and a volume of approximately 108 documents per day. A randomly selected subset of documents was reviewed by multiple participants, producing an interobserver agreement of 97% for documents when narrative-text fields were included and 98% when only Boolean and multiple-choice fields were considered.
Due to the COVID-19 pandemic, it may be challenging for health care workers wearing personal protective equipment to interact with electronic health records. The combination of OMR/OCR technology, a novel UI, and crowdsourcing data-validation processes allowed for the efficient extraction of a large volume of paper medical documents produced during the COVID-19 pandemic.
本研究旨在检验光学标记识别(一种新颖的用户界面)和众包数据验证的有效性,以便在大型医疗中心快速数字化和提取纸质 COVID-19 评估表中的数据。
开发了一种光学标记识别/光学字符识别(OMR/OCR)系统,用于识别 2814 份纸质评估表中选定的字段,每份评估表包含 141 个字段,用于评估潜在的 COVID-19 感染。一种新颖的用户界面(UI)显示了镜像表单,扫描的评估表单带有叠加在左侧的 OMR 结果,以及右侧可编辑的网络表单,以提高数据验证的便利性。众包参与者验证了 OMR 系统的结果。计算了总体错误率和验证所花费的时间。通过多名参与者对表单的子集进行验证,以计算参与者之间的一致性。
OMR/OCR 工具从扫描表单字段中正确提取数据,当将 OMR/OCR 结果与人群验证结果进行比较时,平均准确率为 70%,中位数准确率为 78%。扫描表单以平均每文档 157 秒的速度进行人群验证,每天可验证约 108 份文档。随机选择的文档子集由多名参与者进行审查,当包含叙述性文本字段时,文档的观察者间一致性为 97%,当仅考虑布尔和多项选择题字段时,一致性为 98%。
由于 COVID-19 大流行,佩戴个人防护装备的医护人员可能难以与电子健康记录进行交互。OMR/OCR 技术、新颖的 UI 和众包数据验证过程的结合,使得在 COVID-19 大流行期间高效提取大量纸质医疗文档成为可能。