Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States.
Immunization Safety Office, Centers for Disease Control and Prevention, Atlanta, GA, United States.
JMIR Public Health Surveill. 2022 May 24;8(5):e30426. doi: 10.2196/30426.
Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, due to the difficulty of finding SIRVA cases in large health care databases, population-based studies are scarce.
The goal of the research was to develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes.
We conducted the study among members of a large integrated health care organization who were vaccinated between April 1, 2016, and December 31, 2017, and had subsequent diagnosis codes indicative of shoulder injury. Based on a training data set with a chart review reference standard of 164 cases, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified 3 groups of positive SIRVA cases (definite, probable, and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated cases. We then applied the final automated NLP algorithm to a broader cohort of vaccinated persons with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases.
In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 cases without SIRVA. In the broader cohort of 53,585 vaccinations, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.5% (278/291), 67.7% (84/124), and 17.3% (9/52), respectively.
The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation.
疫苗接种相关的肩部损伤(SIRVA)占国家疫苗伤害赔偿计划收到的所有索赔的一半以上。然而,由于在大型医疗保健数据库中很难找到 SIRVA 病例,因此人群研究很少。
本研究旨在开发一种自然语言处理(NLP)方法,从临床记录中识别 SIRVA 病例。
我们对 2016 年 4 月 1 日至 2017 年 12 月 31 日期间接种疫苗且随后出现肩部损伤诊断代码的大型综合医疗保健组织的成员进行了研究。基于一个具有 164 例图表审查参考标准的训练数据集,我们开发了一种 NLP 算法来提取肩部疾病信息,包括先前的疫苗接种、解剖位置、时间和因果关系。该算法根据证据强度确定了 3 组阳性 SIRVA 病例(明确、可能和可能)。我们将 NLP 结果与 100 例接种疫苗的图表审查参考标准进行了比较。然后,我们将最终的自动化 NLP 算法应用于具有肩部损伤诊断代码的更大接种人群队列,并对 NLP 确定的明确病例和所有 NLP 确定的可能和可能病例的随机样本进行了手动图表确认。
在验证样本中,NLP 算法对 4 例 SIRVA 病例和 96 例无 SIRVA 病例的识别准确率为 100%。在更广泛的 53585 例疫苗接种队列中,NLP 算法确定了 291 例明确、124 例可能和 52 例可能的 SIRVA 病例。这些组的图表确认率分别为 95.5%(278/291)、67.7%(84/124)和 17.3%(9/52)。
该算法在识别阳性 SIRVA 病例方面具有高灵敏度和合理的特异性。NLP 算法有可能在未来的人群研究中用于识别这种罕见的不良事件,避免了劳动密集型图表审查验证。