Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, MA.
CodaMetrix, Boston, MA.
JCO Clin Cancer Inform. 2020 Oct;4:865-874. doi: 10.1200/CCI.20.00028.
Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations.
We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB).
When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains.
We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.
文献中对临床记录挖掘的研究强调了机器学习(ML)优于手工规则。然而,大多数研究都假设存在大型训练集,但这种情况很少见。出于这个原因,在临床环境中,规则仍然很常见。我们提出了 2 种方法,利用预定义规则中编码的知识来告知 ML 决策,并获得高性能,即使注释很少。
我们从 6 家美国医院收集了 501 份前列腺病理报告。报告分为 2711 个核心段,标注了 20 个属性,描述了肿瘤的组织学、分级、扩展和位置。数据集按机构进行分割,以生成跨机构评估设置。我们评估了 4 个系统,即基于规则的方法、ML 模型和 2 个集成了前两种方法的混合系统:规则作为特征模型和分类器置信度模型。测试了几种 ML 算法,包括逻辑回归(LR)、支持向量机(SVM)和极端梯度提升(XGB)。
在单一机构的数据上进行训练时,LR 比规则落后 3.5%(F1 得分:92.2% 95.7%)。混合模型则获得了有竞争力的结果,其中 Classifier Confidence 比规则高出+0.5%(96.2%)。当使用来自多个机构的更多数据时,LR 比规则提高了+1.5%(97.2%),而混合系统则获得了+2.2%(97.7%)的规则作为特征和+2.6%(98.3%)的分类器置信度。用 SVM 或 XGB 替换 LR 也得到了类似的性能提升。
我们开发了使用预定义的手工规则来告知 ML 算法的方法。这些混合系统比单独使用规则或 ML 模型获得更好的性能,即使在训练数据有限的情况下也是如此。