Liu K, Chapman W W, Savova G, Chute C G, Sioutos N, Crowley R S
Department of Biomedical Informatics, UPMC Cancer Pavilion, Suite 301, 5150 Centre Avenue, Pittsburgh, PA 15232, USA.
Methods Inf Med. 2011;50(5):397-407. doi: 10.3414/ME10-01-0020. Epub 2010 Nov 8.
To evaluate the effectiveness of a lexico-syntactic pattern (LSP) matching method for ontology enrichment using clinical documents.
Two domains were separately studied using the same methodology. We used radiology documents to enrich RadLex and pathology documents to enrich National Cancer Institute Thesaurus (NCIT). Several known LSPs were used for semantic knowledge extraction. We first retrieved all sentences that contained LSPs across two large clinical repositories, and examined the frequency of the LSPs. From this set, we randomly sampled LSP instances which were examined by human judges. We used a two-step method to determine the utility of these patterns for enrichment. In the first step, domain experts annotated medically meaningful terms (MMTs) from each sentence within the LSP. In the second step, RadLex and NCIT curators evaluated how many of these MMTs could be added to the resource. To quantify the utility of this LSP method, we defined two evaluation metrics: suggestion rate (SR) and acceptance rate (AR). We used these measures to estimate the yield of concepts and relationships, for each of the two domains.
For NCIT, the concept SR was 24%, and the relationship SR was 65%. The concept AR was 21%, and the relationship AR was 14%. For RadLex, the concept SR was 37%, and the relationship SR was 55%. The concept AR was 11%, and the relationship AR was 44%.
The LSP matching method is an effective method for concept and concept relationship discovery in biomedical domains.
评估一种词汇句法模式(LSP)匹配方法在利用临床文档丰富本体方面的有效性。
使用相同方法分别研究两个领域。我们使用放射学文档来丰富RadLex,使用病理学文档来丰富美国国立癌症研究所叙词表(NCIT)。使用了几种已知的LSP进行语义知识提取。我们首先在两个大型临床知识库中检索所有包含LSP的句子,并检查LSP的出现频率。从这个集合中,我们随机抽取LSP实例,由人工评判员进行检查。我们使用两步法来确定这些模式在丰富本体方面的效用。第一步,领域专家从LSP内的每个句子中注释医学上有意义的术语(MMT)。第二步,RadLex和NCIT的策展人评估这些MMT中有多少可以添加到资源中。为了量化这种LSP方法的效用,我们定义了两个评估指标:建议率(SR)和接受率(AR)。我们使用这些指标来估计两个领域中每个领域的概念和关系的产出。
对于NCIT,概念SR为24%,关系SR为65%。概念AR为21%,关系AR为14%。对于RadLex,概念SR为37%,关系SR为55%。概念AR为11%,关系AR为44%。
LSP匹配方法是生物医学领域中发现概念和概念关系的有效方法。