Liu Liyan, Shorstein Neal H, Amsden Laura B, Herrinton Lisa J
Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.
Department of Ophthalmology, Kaiser Permanente Northern California, Oakland, CA, USA.
Pharmacoepidemiol Drug Saf. 2017 Apr;26(4):378-385. doi: 10.1002/pds.4149. Epub 2017 Jan 3.
Antibiotic prophylaxis is critical to ophthalmology and other surgical specialties. We performed natural language processing (NLP) of 743 838 operative notes recorded for 315 246 surgeries to ascertain two variables needed to study the comparative effectiveness of antibiotic prophylaxis in cataract surgery. The first key variable was an exposure variable, intracameral antibiotic injection. The second was an intraoperative complication, posterior capsular rupture (PCR), which functioned as a potential confounder. To help other researchers use NLP in their settings, we describe our NLP protocol and lessons learned.
For each of the two variables, we used SAS Text Miner and other SAS text-processing modules with a training set of 10 000 (1.3%) operative notes to develop a lexicon. The lexica identified misspellings, abbreviations, and negations, and linked words into concepts (e.g. "antibiotic" linked with "injection"). We confirmed the NLP tools by iteratively obtaining random samples of 2000 (0.3%) notes, with replacement.
The NLP tools identified approximately 60 000 intracameral antibiotic injections and 3500 cases of PCR. The positive and negative predictive values for intracameral antibiotic injection exceeded 99%. For the intraoperative complication, they exceeded 94%.
NLP was a valid and feasible method for obtaining critical variables needed for a research study of surgical safety. These NLP tools were intended for use in the study sample. Use with external datasets or future datasets in our own setting would require further testing. Copyright © 2017 John Wiley & Sons, Ltd.
抗生素预防对于眼科及其他外科专业至关重要。我们对315246例手术记录的743838份手术笔记进行了自然语言处理(NLP),以确定研究白内障手术中抗生素预防比较效果所需的两个变量。第一个关键变量是暴露变量,即前房内抗生素注射。第二个是术中并发症,后囊破裂(PCR),它作为一个潜在的混杂因素。为帮助其他研究人员在其环境中使用NLP,我们描述了我们的NLP方案及经验教训。
对于这两个变量中的每一个,我们使用SAS文本挖掘器和其他SAS文本处理模块,以10000份(1.3%)手术笔记作为训练集来开发词汇表。这些词汇表识别拼写错误、缩写和否定,并将单词链接成概念(例如,“抗生素”与“注射”链接)。我们通过迭代获取2000份(0.3%)笔记的随机样本(有放回)来确认NLP工具。
NLP工具识别出约60000例前房内抗生素注射和3500例PCR病例。前房内抗生素注射的阳性和阴性预测值超过99%。对于术中并发症,其超过94%。
NLP是获取手术安全研究所需关键变量的一种有效且可行的方法。这些NLP工具旨在用于研究样本。在我们自己的环境中与外部数据集或未来数据集一起使用将需要进一步测试。版权所有©2017约翰威立父子有限公司。