Perelman School of Medicine at the University of Pennsylvania, 801 S 24th St #3, Philadelphia, PA, 19146, USA.
Hospital of the University of Pennsylvania, Philadelphia, PA, USA.
J Digit Imaging. 2020 Feb;33(1):131-136. doi: 10.1007/s10278-019-00271-7.
While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author's institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of n-grams such as "renal neoplasm" and "evalu with enhanc" being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.
虽然放射科医生经常会提出随访建议,但我们的初步研究表明,在接受腹部和盆腔影像学检查结果可能为癌症的随访建议的患者中,有 35%至 50%的患者并未进行随访。因此,他们仍然存在因癌症漏诊或延误诊断而导致不良后果的风险。在这项研究中,我们开发了一种算法,使用自然语言处理(NLP)技术和机器学习模型自动检测具有随访建议的自由文本放射科报告。本研究使用的数据集包含作者所在机构的 6000 份自由文本报告。使用 NLP 技术对 1500 个特征进行了工程设计,这些特征包括在进行标记和 Porter 词干化后,训练语料库中最具信息量的单字、双字和三字。在这个数据集上,我们训练了朴素贝叶斯、决策树和最大熵模型。决策树模型的 F1 得分为 0.458,准确率为 0.862,优于朴素贝叶斯(F1 得分为 0.381)和最大熵(F1 得分为 0.387)模型。对这些模型进行了分析,以确定预测特征,其中“肾肿瘤”和“增强评估”等 n 元组的词频是预测随访建议的最具预测性特征。最大限度地提高性能的关键是特征工程,它可以提取预测信息,并根据特征集选择适当的机器学习算法。