Yu Sheng, Kumamaru Kanako K, George Elizabeth, Dunne Ruth M, Bedayat Arash, Neykov Matey, Hunsaker Andetta R, Dill Karin E, Cai Tianxi, Rybicki Frank J
Partners HealthCare Personalized Medicine, Brigham and Women's Hospital & Harvard Medical School, Boston, MA, United States.
Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital & Harvard Medical School, Boston, MA, United States.
J Biomed Inform. 2014 Dec;52:386-93. doi: 10.1016/j.jbi.2014.08.001. Epub 2014 Aug 10.
In this paper we describe an efficient tool based on natural language processing for classifying the detail state of pulmonary embolism (PE) recorded in CT pulmonary angiography reports. The classification tasks include: PE present vs. absent, acute PE vs. others, central PE vs. others, and subsegmental PE vs. others. Statistical learning algorithms were trained with features extracted using the NLP tool and gold standard labels obtained via chart review from two radiologists. The areas under the receiver operating characteristic curves (AUC) for the four tasks were 0.998, 0.945, 0.987, and 0.986, respectively. We compared our classifiers with bag-of-words Naive Bayes classifiers, a standard text mining technology, which gave AUC 0.942, 0.765, 0.766, and 0.712, respectively.
在本文中,我们描述了一种基于自然语言处理的高效工具,用于对CT肺血管造影报告中记录的肺栓塞(PE)详细状态进行分类。分类任务包括:PE存在与否、急性PE与其他情况、中央型PE与其他情况、亚段型PE与其他情况。使用自然语言处理工具提取特征,并通过两位放射科医生的病历审查获得金标准标签,以此训练统计学习算法。这四项任务的受试者操作特征曲线(AUC)下面积分别为0.998、0.945、0.987和0.986。我们将我们的分类器与词袋朴素贝叶斯分类器(一种标准文本挖掘技术)进行了比较,该技术给出的AUC分别为0.942、0.765、0.766和0.712。