标注与检测慢性阻塞性肺疾病的表型信息。

Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.

作者信息

Ju Meizhi, Short Andrea D, Thompson Paul, Bakerly Nawar Diar, Gkoutos Georgios V, Tsaprouni Loukia, Ananiadou Sophia

机构信息

National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK.

Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK.

出版信息

JAMIA Open. 2019 Apr 26;2(2):261-271. doi: 10.1093/jamiaopen/ooz009. eCollection 2019 Jul.

DOI:10.1093/jamiaopen/ooz009

PMID:31984360

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6951876/

Abstract

OBJECTIVES

Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information.

MATERIALS AND METHODS

Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions.

RESULTS

Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information.

DISCUSSION

Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments.

CONCLUSION

The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.

摘要

目的

慢性阻塞性肺疾病（COPD）的表型涵盖一系列肺部异常情况。为了使文本挖掘方法能够从文本数据中识别有关这些表型的相关且可能复杂的信息，我们开发了一个新颖的注释语料库，并用其训练基于神经网络的命名实体识别器，以检测细粒度的COPD表型信息。

材料与方法

由于COPD表型描述中常常会提及其中包含的其他概念（蛋白质、治疗方法等），因此我们的语料库注释既包括最外层的表型描述，也包括嵌套在其中的概念。我们的神经分层双向长短期记忆条件随机场（BiLSTM-CRF）网络首先识别嵌套提及，这些嵌套提及会被输入到后续的BiLSTM-CRF层，以帮助识别包含这些提及的表型。

结果

我们的30篇完整论文语料库（可在http://www.nactem.ac.uk/COPD获取）由专家注释了27030个与表型相关的概念提及，其中大部分已自动链接到UMLS元词表概念。当使用该语料库进行训练时，我们的BiLSTM-CRF网络在识别详细表型信息方面优于其他常用方法。

讨论

我们的方法提取的信息有助于高效定位和探索有关表型的详细信息，例如那些特别涉及对治疗反应的信息。

结论

我们的语料库通过成功用于训练分层BiLSTM-CRF网络以提取不同粒度级别的表型信息，证明了其对于开发提取COPD表型细粒度信息方法的重要性。训练所需的最少人工干预应允许其易于适应提取有关其他疾病的表型信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74a6/6951876/3318eecfeb2e/ooz009f1.jpg

相似文献

Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.标注与检测慢性阻塞性肺疾病的表型信息。

JAMIA Open. 2019 Apr 26;2(2):261-271. doi: 10.1093/jamiaopen/ooz009. eCollection 2019 Jul.

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。

J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text.一个用于临床文本的细粒度中文分词和词性标注语料库。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):66. doi: 10.1186/s12911-019-0770-7.

Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.利用文本挖掘工作流程辅助慢性阻塞性肺疾病（COPD）表型的注释。

J Biomed Semantics. 2015 Mar 14;6:8. doi: 10.1186/s13326-015-0004-6. eCollection 2015.

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.将异构文本源中的表型信息映射到特定领域的术语资源。

PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.

Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.使用文本挖掘技术从PhenoCHF语料库中提取表型信息。

BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1472-6947-15-S2-S3. Epub 2015 Jun 15.

NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库：一种用于疾病名称识别和概念规范化的资源。

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts.Phenonizer：一种用于中文临床文本的细粒度表型命名实体识别器。

Biomed Res Int. 2022 Mar 23;2022:3524090. doi: 10.1155/2022/3524090. eCollection 2022.

Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

TaeC: A manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature.TaeC：一个用于小麦育种文献中性状和表型提取以及实体链接的人工注释文本数据集。

PLoS One. 2024 Jun 13;19(6):e0305475. doi: 10.1371/journal.pone.0305475. eCollection 2024.

引用本文的文献

Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review.医疗保健语言模型及其在信息提取方面的微调：范围综述。

JMIR Med Inform. 2024 Oct 21;12:e60164. doi: 10.2196/60164.

PhenoRerank: A re-ranking model for phenotypic concept recognition pre-trained on human phenotype ontology.PhenoRerank：基于人类表型本体预训练的表型概念识别重新排序模型。

J Biomed Inform. 2022 May;129:104059. doi: 10.1016/j.jbi.2022.104059. Epub 2022 Mar 26.

Building a semantically annotated corpus for chronic disease complications using two document types.使用两种文档类型构建语义标注的慢性病并发症语料库。

PLoS One. 2021 Mar 18;16(3):e0247319. doi: 10.1371/journal.pone.0247319. eCollection 2021.

本文引用的文献

Annotation and detection of drug effects in text for pharmacovigilance.用于药物警戒的文本中药物效应的标注与检测。

J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.

Natural Language Processing for EHR-Based Computational Phenotyping.基于电子健康记录的自然语言处理计算表型。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.

A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.使用表示学习方法从生物医学文献中提取基因-表型关系的管道。

Bioinformatics. 2018 Jul 1;34(13):i386-i394. doi: 10.1093/bioinformatics/bty263.

Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.比较基于深度学习和概念提取的方法用于从临床叙述中进行患者表型分析。

PLoS One. 2018 Feb 15;13(2):e0192360. doi: 10.1371/journal.pone.0192360. eCollection 2018.

Natural language processing of clinical notes for identification of critical limb ischemia.临床记录的自然语言处理以识别严重肢体缺血。

Int J Med Inform. 2018 Mar;111:83-89. doi: 10.1016/j.ijmedinf.2017.12.024. Epub 2017 Dec 28.

SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research.SemEHR：一个通用的语义搜索系统，用于从临床记录中提取语义数据，以提供个性化护理、临床试验招募和临床研究。

J Am Med Inform Assoc. 2018 May 1;25(5):530-537. doi: 10.1093/jamia/ocx160.

DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records.DeepPhe：一种用于从临床记录中提取癌症表型的自然语言处理系统。

Cancer Res. 2017 Nov 1;77(21):e115-e118. doi: 10.1158/0008-5472.CAN-17-0615.

Electronic Health Record Phenotypes for Precision Medicine: Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution.电子健康记录表型在精准医疗中的应用：单机构乳腺癌治疗的观点和注意事项。

Clin Transl Sci. 2018 Jan;11(1):85-92. doi: 10.1111/cts.12514.

Global, regional, and national age-sex specific mortality for 264 causes of death, 1980-2016: a systematic analysis for the Global Burden of Disease Study 2016.全球、地区和国家按年龄、性别划分的 264 种死因的死亡率：2016 年全球疾病负担研究的系统分析。

Lancet. 2017 Sep 16;390(10100):1151-1210. doi: 10.1016/S0140-6736(17)32152-9.

Recurrent neural networks for classifying relations in clinical notes.用于对临床记录中的关系进行分类的循环神经网络。

J Biomed Inform. 2017 Aug;72:85-95. doi: 10.1016/j.jbi.2017.07.006. Epub 2017 Jul 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

标注与检测慢性阻塞性肺疾病的表型信息。

Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.

作者信息

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献