Suppr超能文献

通过基于 h 指数的语义相似度提高化学实体识别。

Improving chemical entity recognition through h-index based semantic similarity.

机构信息

LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal.

出版信息

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S13. doi: 10.1186/1758-2946-7-S1-S13. eCollection 2015.

Abstract

BACKGROUND

Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version.

RESULTS

For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index.

CONCLUSIONS

The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.

摘要

背景

我们在生物创意 IV 挑战赛中的药物名称识别和分类方法(CHEMDNER 任务)旨在通过应用化学实体生物兴趣(ChEBI)映射的语义相似性验证技术来实现高精度。我们的假设是,在同一文本片段中提到的化学实体应该具有某种语义关系。通过适应语义相似性度量来考虑每个祖先的 h 指数,进一步改进了这种验证方法。我们在两个度量标准 simUI 和 simGIC 中应用了这种方法,并对竞赛的结果进行了验证,将每个适应的度量标准与原始版本进行了比较。

结果

对于竞赛,我们训练了一个随机森林分类器,该分类器使用我们系统提供的各种分数,包括语义相似性,这将条件随机场分类器的 F 度量提高了 4.6%。利用基于 h 指数的概念相关性概念,我们能够增强我们的验证过程,以便在固定召回率的情况下,通过排除更高数量的假阳性来提高精度。我们使用不同的相似性度量标准绘制了一系列验证阈值的精度和召回值,使用基于 h 指数的度量标准获得了相同召回率下更高的精度值。

结论

我们引入的语义相似性度量在验证机器学习分类器的文本挖掘结果方面比其他度量标准更有效。通过保持高的精度值,同时提高召回率和 F 度量,我们提高了 CHEMDNER 任务的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7496/4331689/9a3b7d3e0f8e/1758-2946-7-S1-S13-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验