Siontis Konstantinos C, Bhopalwala Huzefa, Dewaswala Nakeya, Scott Christopher G, Noseworthy Peter A, Geske Jeffrey B, Ommen Steve R, Nishimura Rick A, Ackerman Michael J, Friedman Paul A, Arruda-Olson Adelaide M
Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota.
Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota.
Cardiovasc Digit Health J. 2021 Oct;2(5):264-269. doi: 10.1016/j.cvdhj.2021.05.005. Epub 2021 May 20.
The follow-up of implantable cardioverter-defibrillators (ICDs) generates large amounts of valuable structured and unstructured data embedded in device interrogation reports.
We aimed to build a natural language processing (NLP) model for automated capture of ICD-recorded events from device interrogation reports using a single-center cohort of patients with hypertrophic cardiomyopathy (HCM).
A total of 687 ICD interrogation reports from 247 HCM patients were included. Using a derivation set of 480 reports, we developed a rule-based NLP algorithm based on unstructured (free-text) data from the interpretation field of the ICD reports to identify sustained atrial and ventricular arrhythmias, and ICD therapies. A separate model based on structured numerical tabulated data was also developed. Both models were tested in a separate set of the 207 remaining ICD reports. Diagnostic performance was determined in reference to arrhythmia and ICD therapy annotations generated by expert manual review of the same reports.
The NLP system achieved sensitivity 0.98 and 0.99, and F1-scores 0.98 and 0.92 for arrhythmia and ICD therapy events, respectively. In contrast, the performance of the structured data model was significantly lower with sensitivity 0.33 and 0.76, and F1-scores 0.45 and 0.78, for arrhythmia and ICD therapy events, respectively.
An automated NLP system can capture arrhythmia events and ICD therapies from unstructured device interrogation reports with high accuracy in HCM. These findings demonstrate the feasibility of an NLP paradigm for the extraction of data for clinical care and research from ICD reports embedded in the electronic health record.
植入式心脏复律除颤器(ICD)的随访会产生大量有价值的结构化和非结构化数据,这些数据包含在设备问询报告中。
我们旨在构建一个自然语言处理(NLP)模型,用于从肥厚型心肌病(HCM)患者的单中心队列的设备问询报告中自动提取ICD记录的事件。
纳入了247例HCM患者的687份ICD问询报告。使用480份报告的衍生集,我们基于ICD报告解读字段中的非结构化(自由文本)数据开发了一种基于规则的NLP算法,以识别持续性房性和室性心律失常以及ICD治疗。还开发了一个基于结构化数字列表数据的单独模型。两个模型都在另外一组207份剩余的ICD报告中进行了测试。根据对相同报告进行专家人工审核生成的心律失常和ICD治疗注释来确定诊断性能。
NLP系统对心律失常和ICD治疗事件的敏感性分别为0.98和0.99,F1分数分别为0.98和0.92。相比之下,结构化数据模型的性能明显较低,心律失常和ICD治疗事件的敏感性分别为0.33和0.76,F1分数分别为0.45和0.78。
自动化NLP系统可以从HCM患者的非结构化设备问询报告中高精度地捕获心律失常事件和ICD治疗。这些发现证明了NLP范式从电子健康记录中嵌入的ICD报告中提取临床护理和研究数据的可行性。