Chiu Billy, Majewska Olga, Pyysalo Sampo, Wey Laura, Stenius Ulla, Korhonen Anna, Palmer Martha
Language Technology Laboratory, MML, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK.
Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QW, UK.
J Biomed Semantics. 2019 Jan 18;10(1):2. doi: 10.1186/s13326-018-0193-x.
VerbNet, an extensive computational verb lexicon for English, has proved useful for supporting a wide range of Natural Language Processing tasks requiring information about the behaviour and meaning of verbs. Biomedical text processing and mining could benefit from a similar resource. We take the first step towards the development of BioVerbNet: A VerbNet specifically aimed at describing verbs in the area of biomedicine. Because VerbNet-style classification is extremely time consuming, we start from a small manual classification of biomedical verbs and apply a state-of-the-art neural representation model, specifically developed for class-based optimization, to expand the classification with new verbs, using all the PubMed abstracts and the full articles in the PubMed Central Open Access subset as data.
Direct evaluation of the resulting classification against BioSimVerb (verb similarity judgement data in biomedicine) shows promising results when representation learning is performed using verb class-based contexts. Human validation by linguists and biologists reveals that the automatically expanded classification is highly accurate. Including novel, valid member verbs and classes, our method can be used to facilitate cost-effective development of BioVerbNet.
This work constitutes the first effort on applying a state-of-the-art architecture for neural representation learning to biomedical verb classification. While we discuss future optimization of the method, our promising results suggest that the automatic classification released with this article can be used to readily support application tasks in biomedicine.
VerbNet是一个用于英语的广泛的计算动词词汇库,已被证明有助于支持各种需要动词行为和意义信息的自然语言处理任务。生物医学文本处理和挖掘可以从类似的资源中受益。我们朝着生物医学VerbNet(BioVerbNet)的开发迈出了第一步:一个专门用于描述生物医学领域动词的VerbNet。由于VerbNet风格的分类极其耗时,我们从对生物医学动词进行小规模手动分类开始,应用一种专门为基于类的优化而开发的先进神经表示模型,以使用所有PubMed摘要和PubMed Central开放获取子集中的全文作为数据,用新动词扩展分类。
当使用基于动词类的上下文进行表示学习时,将所得分类与BioSimVerb(生物医学中的动词相似性判断数据)进行直接评估显示出有希望的结果。语言学家和生物学家的人工验证表明,自动扩展的分类非常准确。包括新颖、有效的成员动词和类,我们的方法可用于促进BioVerbNet的经济高效开发。
这项工作是将先进的神经表示学习架构应用于生物医学动词分类的首次尝试。虽然我们讨论了该方法未来的优化,但我们有希望的结果表明,本文发布的自动分类可用于轻松支持生物医学中的应用任务。