Simmons College, School of Library and Information Science, 300 The Fenway, Boston, MA 02115, United States.
University at Albany, United States.
J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.
The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a "sight unseen" task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP.
2016CEGS N-GRID 临床记录共享任务包含三个轨道。轨道 1 专注于对 1000 份新的精神病入院记录进行去识别处理。该轨道在两个子轨道中处理去识别:轨道 1.A 是一个“不见样本”任务,有九个团队在 600 份新记录上运行现有的去识别系统,而无需进行任何修改或培训,以评估系统对新数据的泛化能力。在该轨道上表现最好的系统的 F1 得分为 0.799。轨道 1.B 是一个关于去识别的传统自然语言处理(NLP)共享任务,有 15 个团队有两个月的时间在新数据上训练他们的系统,然后在未注释的测试集上进行测试。来自该轨道的表现最好的系统的 F1 得分为 0.914。轨道 1.A 的分数表明,未经修改的现有系统在没有训练数据的情况下,无法很好地泛化到新数据。轨道 1.B 的分数略低于 2014 年的去识别共享任务(与 2016 年轨道 1.B 几乎相同),这表明这些新的精神病记录对 NLP 系统构成了更大的挑战。总的来说,去识别仍然是一个未解决的问题,尽管它对临床 NLP 的未来很重要。