Suppr超能文献

词汇句法模式匹配在利用临床文档丰富本体方面的有效性。

Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents.

作者信息

Liu K, Chapman W W, Savova G, Chute C G, Sioutos N, Crowley R S

机构信息

Department of Biomedical Informatics, UPMC Cancer Pavilion, Suite 301, 5150 Centre Avenue, Pittsburgh, PA 15232, USA.

出版信息

Methods Inf Med. 2011;50(5):397-407. doi: 10.3414/ME10-01-0020. Epub 2010 Nov 8.

Abstract

OBJECTIVE

To evaluate the effectiveness of a lexico-syntactic pattern (LSP) matching method for ontology enrichment using clinical documents.

METHODS

Two domains were separately studied using the same methodology. We used radiology documents to enrich RadLex and pathology documents to enrich National Cancer Institute Thesaurus (NCIT). Several known LSPs were used for semantic knowledge extraction. We first retrieved all sentences that contained LSPs across two large clinical repositories, and examined the frequency of the LSPs. From this set, we randomly sampled LSP instances which were examined by human judges. We used a two-step method to determine the utility of these patterns for enrichment. In the first step, domain experts annotated medically meaningful terms (MMTs) from each sentence within the LSP. In the second step, RadLex and NCIT curators evaluated how many of these MMTs could be added to the resource. To quantify the utility of this LSP method, we defined two evaluation metrics: suggestion rate (SR) and acceptance rate (AR). We used these measures to estimate the yield of concepts and relationships, for each of the two domains.

RESULTS

For NCIT, the concept SR was 24%, and the relationship SR was 65%. The concept AR was 21%, and the relationship AR was 14%. For RadLex, the concept SR was 37%, and the relationship SR was 55%. The concept AR was 11%, and the relationship AR was 44%.

CONCLUSION

The LSP matching method is an effective method for concept and concept relationship discovery in biomedical domains.

摘要

目的

评估一种词汇句法模式(LSP)匹配方法在利用临床文档丰富本体方面的有效性。

方法

使用相同方法分别研究两个领域。我们使用放射学文档来丰富RadLex,使用病理学文档来丰富美国国立癌症研究所叙词表(NCIT)。使用了几种已知的LSP进行语义知识提取。我们首先在两个大型临床知识库中检索所有包含LSP的句子,并检查LSP的出现频率。从这个集合中,我们随机抽取LSP实例,由人工评判员进行检查。我们使用两步法来确定这些模式在丰富本体方面的效用。第一步,领域专家从LSP内的每个句子中注释医学上有意义的术语(MMT)。第二步,RadLex和NCIT的策展人评估这些MMT中有多少可以添加到资源中。为了量化这种LSP方法的效用,我们定义了两个评估指标:建议率(SR)和接受率(AR)。我们使用这些指标来估计两个领域中每个领域的概念和关系的产出。

结果

对于NCIT,概念SR为24%,关系SR为65%。概念AR为21%,关系AR为14%。对于RadLex,概念SR为37%,关系SR为55%。概念AR为11%,关系AR为44%。

结论

LSP匹配方法是生物医学领域中发现概念和概念关系的有效方法。

相似文献

1
Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents.
Methods Inf Med. 2011;50(5):397-407. doi: 10.3414/ME10-01-0020. Epub 2010 Nov 8.
4
Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.
J Biomed Inform. 2015 Aug;56:57-64. doi: 10.1016/j.jbi.2015.04.013. Epub 2015 May 19.
5
Assessment of disease named entity recognition on a corpus of annotated sentences.
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
6
A knowledge-driven approach to biomedical document conceptualization.
Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.
7
Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.
J Biomed Semantics. 2015 Apr 2;6:15. doi: 10.1186/s13326-015-0011-7. eCollection 2015.
8
Semantic enrichment for medical ontologies.
J Biomed Inform. 2006 Apr;39(2):209-26. doi: 10.1016/j.jbi.2005.08.001. Epub 2005 Sep 9.
10
Lack of selectivity for syntax relative to word meanings throughout the language network.
Cognition. 2020 Oct;203:104348. doi: 10.1016/j.cognition.2020.104348. Epub 2020 Jun 20.

引用本文的文献

1
Logical definition-based identification of potential missing concepts in SNOMED CT.
BMC Med Inform Decis Mak. 2023 May 9;23(Suppl 1):87. doi: 10.1186/s12911-023-02183-7.
3
Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec;2016:1081-1088. doi: 10.1109/BIBM.2016.7822672. Epub 2017 Jan 19.
4
Workflow Lexicons in Healthcare: Validation of the SWIM Lexicon.
J Digit Imaging. 2017 Jun;30(3):255-266. doi: 10.1007/s10278-016-9935-4.
5
Similarity-Based Recommendation of New Concepts to a Terminology.
AMIA Annu Symp Proc. 2015 Nov 5;2015:386-95. eCollection 2015.
6
NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.
BMC Bioinformatics. 2016 Jan 14;17:32. doi: 10.1186/s12859-015-0871-y.
7
Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.
J Biomed Semantics. 2015 Apr 2;6:15. doi: 10.1186/s13326-015-0011-7. eCollection 2015.
8
9
Natural Language Processing methods and systems for biomedical ontology learning.
J Biomed Inform. 2011 Feb;44(1):163-79. doi: 10.1016/j.jbi.2010.07.006. Epub 2010 Jul 18.

本文引用的文献

1
A multilingual ontology for infectious disease surveillance: rationale, design and challenges.
Lang Resour Eval. 2006;40(3):405. doi: 10.1007/s10579-007-9019-7. Epub 2007 Jun 26.
2
Natural Language Processing methods and systems for biomedical ontology learning.
J Biomed Inform. 2011 Feb;44(1):163-79. doi: 10.1016/j.jbi.2010.07.006. Epub 2010 Jul 18.
5
Knowledge-based methods to help clinicians find answers in MEDLINE.
J Am Med Inform Assoc. 2007 Nov-Dec;14(6):772-80. doi: 10.1197/jamia.M2407. Epub 2007 Aug 21.
6
Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger.
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):641-50. doi: 10.1197/jamia.M2392. Epub 2007 Jun 28.
7
Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports.
Int J Med Inform. 2008 Feb;77(2):107-13. doi: 10.1016/j.ijmedinf.2007.01.002. Epub 2007 Feb 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验