Li Dingcheng, Wang Zhen, Wang Liwei, Sohn Sunghwan, Shen Feichen, Murad Mohammad Hassan, Liu Hongfang
Department of Health Sciences Research, Mayo Clinic, Rochester, USA.
Watson Health Cloud, IBM, Rochester, USA.
Am J Inf Manag. 2016 Nov;1(1):1-9. Epub 2016 Aug 31.
Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
系统评价(SRs)以结构化可重复的方式,对针对特定问题的所有相关研究进行识别、评估和综合。高质量的系统评价遵循严格的程序,需要大量资源和时间。我们研究了先进的文本挖掘方法,以减轻系统评价中与摘要筛选相关的负担,并提供高层次的信息总结。提出了一个由三个自定义的基于语义的排序指标组成的文本挖掘系统评价支持框架,包括关键词相关性、索引词相关性和主题相关性。关键词相关性基于搜索策略中使用的用户定义关键词列表。索引词相关性源自领域专家开发的用于索引期刊文章和书籍的索引词汇表。主题相关性定义为根据潜在狄利克雷分配(一种基于贝叶斯的主题发现模型)生成的主题,检索到的摘要之间的语义相似性。我们使用三篇已发表的涉及各种主题(大众媒体干预、直肠癌和流感疫苗)的系统评价对所提出的框架进行了测试。结果表明,在节省了91.8%、85.7%和49.3%的摘要筛选工作量的情况下,这三个案例的召回率分别高达100%。通过主题分析,人工识别的相关研究显示出很强的主题相似性,这支持将主题分析作为相关性指标。结果表明,先进的文本挖掘方法可以显著减少系统评价的摘要筛选工作量,并提供相关研究的信息性总结。