Suppr超能文献

去识别精神科入院记录:2016 年 CEGS N-GRID 共享任务跟踪 1 概述。

De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

机构信息

Simmons College, School of Library and Information Science, 300 The Fenway, Boston, MA 02115, United States.

University at Albany, United States.

出版信息

J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.

Abstract

The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a "sight unseen" task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP.

摘要

2016CEGS N-GRID 临床记录共享任务包含三个轨道。轨道 1 专注于对 1000 份新的精神病入院记录进行去识别处理。该轨道在两个子轨道中处理去识别:轨道 1.A 是一个“不见样本”任务,有九个团队在 600 份新记录上运行现有的去识别系统,而无需进行任何修改或培训,以评估系统对新数据的泛化能力。在该轨道上表现最好的系统的 F1 得分为 0.799。轨道 1.B 是一个关于去识别的传统自然语言处理(NLP)共享任务,有 15 个团队有两个月的时间在新数据上训练他们的系统,然后在未注释的测试集上进行测试。来自该轨道的表现最好的系统的 F1 得分为 0.914。轨道 1.A 的分数表明,未经修改的现有系统在没有训练数据的情况下,无法很好地泛化到新数据。轨道 1.B 的分数略低于 2014 年的去识别共享任务(与 2016 年轨道 1.B 几乎相同),这表明这些新的精神病记录对 NLP 系统构成了更大的挑战。总的来说,去识别仍然是一个未解决的问题,尽管它对临床 NLP 的未来很重要。

相似文献

引用本文的文献

7
Privacy-Preserving Deep Learning NLP Models for Cancer Registries.用于癌症登记处的隐私保护深度学习自然语言处理模型。
IEEE Trans Emerg Top Comput. 2021 Jul-Sep;9(3):1219-1230. doi: 10.1109/tetc.2020.2983404. Epub 2020 Apr 16.

本文引用的文献

7
Creation of a new longitudinal corpus of clinical narratives.创建一个新的临床叙事纵向语料库。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验