Suppr超能文献

标注与检测慢性阻塞性肺疾病的表型信息。

Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.

作者信息

Ju Meizhi, Short Andrea D, Thompson Paul, Bakerly Nawar Diar, Gkoutos Georgios V, Tsaprouni Loukia, Ananiadou Sophia

机构信息

National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK.

Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK.

出版信息

JAMIA Open. 2019 Apr 26;2(2):261-271. doi: 10.1093/jamiaopen/ooz009. eCollection 2019 Jul.

Abstract

OBJECTIVES

Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information.

MATERIALS AND METHODS

Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions.

RESULTS

Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information.

DISCUSSION

Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments.

CONCLUSION

The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.

摘要

目的

慢性阻塞性肺疾病(COPD)的表型涵盖一系列肺部异常情况。为了使文本挖掘方法能够从文本数据中识别有关这些表型的相关且可能复杂的信息,我们开发了一个新颖的注释语料库,并用其训练基于神经网络的命名实体识别器,以检测细粒度的COPD表型信息。

材料与方法

由于COPD表型描述中常常会提及其中包含的其他概念(蛋白质、治疗方法等),因此我们的语料库注释既包括最外层的表型描述,也包括嵌套在其中的概念。我们的神经分层双向长短期记忆条件随机场(BiLSTM-CRF)网络首先识别嵌套提及,这些嵌套提及会被输入到后续的BiLSTM-CRF层,以帮助识别包含这些提及的表型。

结果

我们的30篇完整论文语料库(可在http://www.nactem.ac.uk/COPD获取)由专家注释了27030个与表型相关的概念提及,其中大部分已自动链接到UMLS元词表概念。当使用该语料库进行训练时,我们的BiLSTM-CRF网络在识别详细表型信息方面优于其他常用方法。

讨论

我们的方法提取的信息有助于高效定位和探索有关表型的详细信息,例如那些特别涉及对治疗反应的信息。

结论

我们的语料库通过成功用于训练分层BiLSTM-CRF网络以提取不同粒度级别的表型信息,证明了其对于开发提取COPD表型细粒度信息方法的重要性。训练所需的最少人工干预应允许其易于适应提取有关其他疾病的表型信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74a6/6951876/3318eecfeb2e/ooz009f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验