TurkuNLP Group, University of Turku, Turku, Finland.
BMC Bioinformatics. 2020 Dec 29;21(Suppl 23):580. doi: 10.1186/s12859-020-03905-8.
: Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine.
: We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing.
: We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.
句法分析,或解析,是自然语言处理中的一项关键任务,也是许多文本挖掘方法的必备组件。近年来,通用依存关系 (UD) 已成为依存解析的主要形式。虽然近年来有许多以 UD 为中心的任务极大地推动了多语言解析的发展,但对于生物医学等专业领域的文本解析的研究却很少。
我们使用最近引入的 CRAFT-SA 共享任务数据集,探索将最先进的神经依存解析方法应用于生物医学文本。CRAFT-SA 任务广泛遵循 UD 表示法和最新的 UD 任务约定,使我们能够针对该任务微调与 UD 兼容的图尔库神经解析器和 UDify 神经解析器。我们进一步通过广泛选择 BERT 模型评估迁移学习的效果,包括专门针对生物医学文本处理预训练的几个模型。
我们发现,最近引入的神经解析技术能够对生物医学文本进行高度准确的分析,大大提高了原始 CRAFT-SA 共享任务中报告的最佳性能。我们还发现,使用在域内文本上预训练的深度迁移学习模型进行初始化是最大限度提高解析方法性能的关键。