Lee Jaehyun, Sharma Ishan, Arcaro Nichole, Blackstone Eugene H, Gillinov A Marc, Svensson Lars G, Karamlou Tara, Chen David
Cardiovascular Outcomes Research and Registries, Cleveland Clinic, Cleveland Clinic, Cleveland, OH 44195, United States.
Heart, Vascular, and Thoracic Institute, Cleveland Clinic, Cleveland, OH 44195, United States.
JAMIA Open. 2024 Jul 24;7(3):ooae054. doi: 10.1093/jamiaopen/ooae054. eCollection 2024 Oct.
Surgical registries play a crucial role in clinical knowledge discovery, hospital quality assurance, and quality improvement. However, maintaining a surgical registry requires significant monetary and human resources given the wide gamut of information abstracted from medical records ranging from patient co-morbidities to procedural details to post-operative outcomes. Although natural language processing (NLP) methods such as pretrained language models (PLMs) have promised automation of this process, there are yet substantial barriers to implementation. In particular, constant shifts in both underlying data and required registry content are hurdles to the application of NLP technologies.
In our work, we evaluate the application of PLMs for automating the population of the Society of Thoracic Surgeons (STSs) adult cardiac surgery registry (ACS) procedural elements, for which we term Cardiovascular Surgery Bidirectional Encoder Representations from Transformers (CS-BERT). CS-BERT was validated across multiple satellite sites and versions of the STS-ACS registry.
CS-BERT performed well (F1 score of 0.8417 ± 0.1838) in common cardiac surgery procedures compared to models based on diagnosis codes (F1 score of 0.6130 ± 0.0010). The model also generalized well to satellite sites and across different versions of the STS-ACS registry.
This study provides evidence that PLMs can be used to extract the more common cardiac surgery procedure variables in the STS-ACS registry, potentially reducing need for expensive human annotation and wide scale dissemination. Further research is needed for rare procedural variables which suffer from both lack of data and variable documentation quality.
手术登记在临床知识发现、医院质量保证和质量改进中起着至关重要的作用。然而,鉴于从病历中提取的信息范围广泛,从患者合并症到手术细节再到术后结果,维护手术登记需要大量的资金和人力资源。尽管诸如预训练语言模型(PLM)之类的自然语言处理(NLP)方法有望实现这一过程的自动化,但在实施方面仍存在重大障碍。特别是,基础数据和所需登记内容的不断变化是NLP技术应用的障碍。
在我们的工作中,我们评估了PLM在自动填充胸外科医师协会(STS)成人心脏手术登记(ACS)程序元素方面的应用,我们将其称为心血管外科双向编码器表征从变压器(CS-BERT)。CS-BERT在多个卫星站点和不同版本的STS-ACS登记中得到了验证。
与基于诊断代码的模型(F1分数为0.6130±0.0010)相比,CS-BERT在常见心脏手术程序中表现良好(F1分数为0.8417±0.1838)。该模型在卫星站点和不同版本的STS-ACS登记中也具有良好的通用性。
本研究提供了证据,证明PLM可用于提取STS-ACS登记中更常见的心脏手术程序变量,可能减少对昂贵人工注释的需求并实现广泛传播。对于缺乏数据且文档质量参差不齐的罕见手术变量,还需要进一步研究。