Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, PR China.
Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, PR China.
Int J Med Inform. 2019 Apr;124:6-12. doi: 10.1016/j.ijmedinf.2019.01.004. Epub 2019 Jan 7.
To develop a natural language processing (NLP)-based algorithm for extracting clinically useful information for patients with hepatocellular carcinoma (HCC) from Chinese electronic medical records (EMRs) and use these data for the assessment of HCC staging.
Clinical documents, including operation notes, radiology and pathology reports, of 92 HCC patients were collected from Chinese EMRs. We randomly grouped these patients into training (n = 60) and testing (n = 32) datasets. Rule-based and hybrid methods for extracting information were developed using the training set of manually-annotated operation notes. The method with better performance was used to process other documents. The performance of the algorithm was assessed via calculating the precision, recall and F-score for exact-boundary and partial-boundary matching strategies. The utility of clinically useful information for the HCC staging was assessed in comparison with that manually reviewed.
For operation notes, the rule-based and hybrid methods had a precision, recall and F-score ≥80% when the exact-boundary and partial-boundary matching strategies were applied to the testing dataset. By using the rule-based method (which has better performance than the hybrid method), three other types of documents also obtained good performance. When the extracted clinically useful information was applied for the HCC staging, the concordance rate with the manual review was 75%.
A NLP system was developed for clinical information extraction and HCC staging based on EMRs, and the results indicate that Chinese NLP has potential utility in clinical research.
开发一种基于自然语言处理(NLP)的算法,从中文电子病历(EMR)中提取肝细胞癌(HCC)患者的临床有用信息,并利用这些数据评估 HCC 分期。
从中国 EMR 中收集了 92 例 HCC 患者的临床文档,包括手术记录、影像学和病理学报告。我们将这些患者随机分为训练集(n=60)和测试集(n=32)。使用训练集中手动标注的手术记录开发了基于规则和混合方法来提取信息。使用性能更好的方法来处理其他文档。通过计算精确边界和部分边界匹配策略的精度、召回率和 F 分数来评估算法的性能。将临床有用信息用于 HCC 分期的效用与手动审查进行了比较。
对于手术记录,当在测试数据集上应用精确边界和部分边界匹配策略时,基于规则和混合方法的精度、召回率和 F 分数均≥80%。通过使用基于规则的方法(性能优于混合方法),还可以从其他三种类型的文档中获得良好的性能。当应用提取的临床有用信息进行 HCC 分期时,与手动审查的一致性率为 75%。
开发了一种基于 EMR 的临床信息提取和 HCC 分期的 NLP 系统,结果表明中文 NLP 在临床研究中有潜在的应用价值。