Shen Feichen, Liu Sijia, Fu Sunyang, Wang Yanshan, Henry Sam, Uzuner Ozlem, Liu Hongfang
Division of Digital Health Sciences, Mayo Clinic, Rochester, MN, United States.
Department of Information Sciences and Technology, George Mason University, Fairfax, VA, United States.
JMIR Med Inform. 2021 Jan 27;9(1):e24008. doi: 10.2196/24008.
As a risk factor for many diseases, family history (FH) captures both shared genetic variations and living environments among family members. Though there are several systems focusing on FH extraction using natural language processing (NLP) techniques, the evaluation protocol of such systems has not been standardized.
The n2c2/OHNLP (National NLP Clinical Challenges/Open Health Natural Language Processing) 2019 FH extraction task aims to encourage the community efforts on a standard evaluation and system development on FH extraction from synthetic clinical narratives.
We organized the first BioCreative/OHNLP FH extraction shared task in 2018. We continued the shared task in 2019 in collaboration with the n2c2 and OHNLP consortium, and organized the 2019 n2c2/OHNLP FH extraction track. The shared task comprises 2 subtasks. Subtask 1 focuses on identifying family member entities and clinical observations (diseases), and subtask 2 expects the association of the living status, side of the family, and clinical observations with family members to be extracted. Subtask 2 is an end-to-end task which is based on the result of subtask 1. We manually curated the first deidentified clinical narrative from FH sections of clinical notes at Mayo Clinic Rochester, the content of which is highly relevant to patients' FH.
A total of 17 teams from all over the world participated in the n2c2/OHNLP FH extraction shared task, where 38 runs were submitted for subtask 1 and 21 runs were submitted for subtask 2. For subtask 1, the top 3 runs were generated by Harbin Institute of Technology, ezDI, Inc., and The Medical University of South Carolina with F1 scores of 0.8745, 0.8225, and 0.8130, respectively. For subtask 2, the top 3 runs were from Harbin Institute of Technology, ezDI, Inc., and University of Florida with F1 scores of 0.681, 0.6586, and 0.6544, respectively. The workshop was held in conjunction with the AMIA 2019 Fall Symposium.
A wide variety of methods were used by different teams in both tasks, such as Bidirectional Encoder Representations from Transformers, convolutional neural network, bidirectional long short-term memory, conditional random field, support vector machine, and rule-based strategies. System performances show that relation extraction from FH is a more challenging task when compared to entity identification task.
家族史(FH)作为多种疾病的风险因素,反映了家庭成员之间共享的基因变异和生活环境。尽管有多个系统致力于使用自然语言处理(NLP)技术提取家族史,但此类系统的评估协议尚未标准化。
2019年n2c2/OHNLP(国家NLP临床挑战/开放健康自然语言处理)家族史提取任务旨在鼓励社区致力于对从合成临床叙述中提取家族史进行标准评估和系统开发。
我们在2018年组织了首届BioCreative/OHNLP家族史提取共享任务。2019年,我们与n2c2和OHNLP联盟合作继续开展共享任务,并组织了2019年n2c2/OHNLP家族史提取赛道。该共享任务包括2个子任务。子任务1专注于识别家庭成员实体和临床观察结果(疾病),子任务2期望提取生活状况、家族分支以及临床观察结果与家庭成员之间的关联。子任务2是一个基于子任务1结果的端到端任务。我们从梅奥诊所罗切斯特分院临床笔记的家族史部分手动整理了第一份去标识化临床叙述,其内容与患者的家族史高度相关。
来自世界各地的17个团队参加了n2c2/OHNLP家族史提取共享任务,其中为子任务1提交了38次运行结果,为子任务2提交了21次运行结果。对于子任务1,排名前三的运行结果分别由哈尔滨工业大学、ezDI公司和南卡罗来纳医科大学生成,F1分数分别为0.8745、0.8225和0.8130。对于子任务2,排名前三的运行结果来自哈尔滨工业大学、ezDI公司和佛罗里达大学,F1分数分别为0.681、0.6586和0.6544。该研讨会与2019年美国医学信息学会秋季研讨会同期举行。
不同团队在这两个任务中使用了多种方法,如基于变换器的双向编码器表征、卷积神经网络、双向长短期记忆、条件随机场、支持向量机和基于规则的策略。系统性能表明,与实体识别任务相比,从家族史中提取关系是一项更具挑战性的任务。