Liu Rey-Long
Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan, Republic of China.
J Biomed Semantics. 2019 Jan 7;10(1):1. doi: 10.1186/s13326-018-0194-9.
Conclusive association entities (CAEs) in a biomedical article a are those biomedical entities (e.g., genes, diseases, and chemicals) that are specifically involved in the associations concluded in a. Identification of CAEs among candidate entities in the title and the abstract of an article is essential for curation and exploration of conclusive findings in biomedical literature. However, the identification is challenging, as it is difficult to conduct semantic analysis to determine whether an entity is a specific target on which the reported findings are conclusive enough.
We investigate how five types of statistical indicators can contribute to prioritizing the candidate entities so that CAEs can be ranked on the top for exploratory analysis. The indicators work on titles and abstracts of articles. They are evaluated by the CAEs designated by biomedical experts to curate entity associations concluded in articles. The indicators have significantly different performance in ranking the CAEs identified by the biomedical experts. Some indicators do not perform well in CAE identification, even though they were used in many techniques for article retrieval and keyword extraction. Learning-based fusion of certain indicators can further improve performance. Most of the articles have at least one of their CAEs successfully ranked at top-2 positions. The CAEs can be visualized to support exploratory analysis of conclusive results on the CAEs.
With proper fusion of the statistical indicators, CAEs in biomedical articles can be identified for exploratory analysis. The results are essential for the indexing of biomedical articles to support validation of highly related conclusive findings in biomedical literature.
生物医学文章中的结论性关联实体(CAE)是指那些具体参与文章中所总结关联的生物医学实体(如基因、疾病和化学物质)。在文章标题和摘要的候选实体中识别CAE对于生物医学文献中结论性发现的整理和探索至关重要。然而,这种识别具有挑战性,因为难以进行语义分析来确定一个实体是否是所报道发现足够确凿的特定目标。
我们研究了五种类型的统计指标如何有助于对候选实体进行优先级排序,以便CAE能够在探索性分析中排在首位。这些指标作用于文章的标题和摘要。它们由生物医学专家指定的CAE进行评估,以整理文章中总结的实体关联。这些指标在对生物医学专家识别出的CAE进行排序时表现出显著不同的性能。一些指标在CAE识别中表现不佳,尽管它们在许多文章检索和关键词提取技术中都有使用。基于学习的某些指标融合可以进一步提高性能。大多数文章至少有一个CAE成功排在前两位。可以对CAE进行可视化,以支持对CAE上的结论性结果进行探索性分析。
通过对统计指标的适当融合,可以识别生物医学文章中的CAE以进行探索性分析。这些结果对于生物医学文章的索引至关重要,以支持生物医学文献中高度相关的结论性发现的验证。