Institute of Global Health, University of Geneva, Geneva, Switzerland.
Médecins Sans Frontières, Geneva, Switzerland.
J Med Internet Res. 2023 Sep 15;25:e39736. doi: 10.2196/39736.
Literature reviews (LRs) identify, evaluate, and synthesize relevant papers to a particular research question to advance understanding and support decision-making. However, LRs, especially traditional systematic reviews, are slow, resource-intensive, and become outdated quickly.
LiteRev is an advanced and enhanced version of an existing automation tool designed to assist researchers in conducting LRs through the implementation of cutting-edge technologies such as natural language processing and machine learning techniques. In this paper, we present a comprehensive explanation of LiteRev's capabilities, its methodology, and an evaluation of its accuracy and efficiency to a manual LR, highlighting the benefits of using LiteRev.
Based on the user's query, LiteRev performs an automated search on a wide range of open-access databases and retrieves relevant metadata on the resulting papers, including abstracts or full texts when available. These abstracts (or full texts) are text processed and represented as a term frequency-inverse document frequency matrix. Using dimensionality reduction (pairwise controlled manifold approximation) and clustering (hierarchical density-based spatial clustering of applications with noise) techniques, the corpus is divided into different topics described by a list of the most important keywords. The user can then select one or several topics of interest, enter additional keywords to refine its search, or provide key papers to the research question. Based on these inputs, LiteRev performs a k-nearest neighbor (k-NN) search and suggests a list of potentially interesting papers. By tagging the relevant ones, the user triggers new k-NN searches until no additional paper is suggested for screening. To assess the performance of LiteRev, we ran it in parallel to a manual LR on the burden and care for acute and early HIV infection in sub-Saharan Africa. We assessed the performance of LiteRev using true and false predictive values, recall, and work saved over sampling.
LiteRev extracted, processed, and transformed text into a term frequency-inverse document frequency matrix of 631 unique papers from PubMed. The topic modeling module identified 16 topics and highlighted 2 topics of interest to the research question. Based on 18 key papers, the k-NNs module suggested 193 papers for screening out of 613 papers in total (31.5% of the whole corpus) and correctly identified 64 relevant papers out of the 87 papers found by the manual abstract screening (recall rate of 73.6%). Compared to the manual full text screening, LiteRev identified 42 relevant papers out of the 48 papers found manually (recall rate of 87.5%). This represents a total work saved over sampling of 56%.
We presented the features and functionalities of LiteRev, an automation tool that uses natural language processing and machine learning methods to streamline and accelerate LRs and support researchers in getting quick and in-depth overviews on any topic of interest.
文献综述(LRs)通过识别、评估和综合与特定研究问题相关的论文,来促进理解并支持决策。然而,LRs,特别是传统的系统综述,速度慢、资源密集且很快就会过时。
LiteRev 是一款现有自动化工具的高级增强版本,旨在通过实施自然语言处理和机器学习技术等前沿技术,帮助研究人员进行 LRs。在本文中,我们全面介绍了 LiteRev 的功能、方法以及与手动 LR 相比的准确性和效率评估,突出了使用 LiteRev 的优势。
根据用户的查询,LiteRev 在广泛的开放获取数据库上执行自动搜索,并检索有关论文的相关元数据,包括摘要或全文(如果可用)。这些摘要(或全文)经过文本处理并表示为词频-逆文档频率矩阵。使用降维(成对控制流形逼近)和聚类(基于密度的层次空间聚类应用噪声)技术,语料库被分为不同的主题,每个主题由一系列最重要的关键词描述。然后,用户可以选择一个或多个感兴趣的主题,输入其他关键词来细化搜索,或提供关键论文来回答研究问题。基于这些输入,LiteRev 执行 k-最近邻(k-NN)搜索,并建议一系列可能感兴趣的论文。通过标记相关论文,用户触发新的 k-NN 搜索,直到没有进一步的论文可供筛选。为了评估 LiteRev 的性能,我们在撒哈拉以南非洲急性和早期 HIV 感染的负担和护理方面,与手动 LR 并行运行。我们使用真阳性和假阳性预测值、召回率和节省的工作来评估 LiteRev 的性能。
LiteRev 从 PubMed 中提取、处理和转换了 631 篇独特论文的文本,生成了词频-逆文档频率矩阵。主题建模模块识别出 16 个主题,并突出了 2 个与研究问题相关的主题。基于 18 篇关键论文,k-NN 模块建议筛选 193 篇论文,而总共(整个语料库的 31.5%)有 613 篇论文,正确识别出手动摘要筛选中找到的 87 篇相关论文中的 64 篇(召回率为 73.6%)。与手动全文筛选相比,LiteRev 从手动筛选中找到的 48 篇论文中识别出 42 篇相关论文(召回率为 87.5%)。这代表着采样节省了 56%的工作。
我们介绍了 LiteRev 的功能和特点,这是一款使用自然语言处理和机器学习方法的自动化工具,可以简化和加速 LRs,并帮助研究人员快速深入地了解任何感兴趣的主题。