IMO Health, 9600 W Bryn Mawr Ave # 100, Rosemont, IL, 60018, United States.
Merck & Co, Inc, 126 East Lincoln Ave, Rahway, NJ, United States, 1 619-643-2693.
JMIR Med Inform. 2024 Oct 23;12:e54653. doi: 10.2196/54653.
Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered to be a complex, time-consuming, labor-intensive, and expensive task.
This study aimed to present a solution based on natural language processing (NLP) that accelerates and streamlines the SLR process for observational studies using real-world data.
We followed an agile software development and iterative software engineering methodology to build a customized intelligent end-to-end living NLP-assisted solution for observational SLR tasks. Multiple machine learning-based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. The system integrates explainable articificial intelligence to provide evidence for NLP algorithms and add transparency to extracted literature data elements. The system was developed based on 3 existing SLR projects of observational studies, including the epidemiology studies of human papillomavirus-associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies on pneumococcal vaccines.
Our Intelligent SLR Platform covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms achieved accuracy scores of 0.86-0.90 on article screening tasks (framed as text classification tasks) and macroaverage F1 scores of 0.57-0.89 on data element extraction tasks (framed as named entity recognition tasks).
Cutting-edge NLP algorithms expedite SLR for observational studies, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in observational studies. Aligning the living SLR concept, the system has the potential to update literature data and enable scientists to easily stay current with the literature related to observational studies prospectively and continuously.
系统文献综述(SLR)是一种从已发表的文献中识别和总结证据的强大方法,被认为是一项复杂、耗时、劳动密集且昂贵的任务。
本研究旨在提出一种基于自然语言处理(NLP)的解决方案,以加速和简化使用真实世界数据的观察性研究的 SLR 过程。
我们遵循敏捷软件开发和迭代软件工程方法,为观察性 SLR 任务构建了一个定制的智能端到端自然语言处理辅助解决方案。采用了多种基于机器学习的 NLP 算法来自动化文章筛选和数据元素提取过程。NLP 预测结果可以由领域专家进一步审查和验证,遵循人机交互设计。该系统集成了可解释的人工智能,为 NLP 算法提供证据,并为提取的文献数据元素增加透明度。该系统是基于 3 个现有的观察性研究 SLR 项目开发的,包括人乳头瘤病毒相关疾病的流行病学研究、肺炎球菌疾病的疾病负担以及肺炎球菌疫苗的成本效益研究。
我们的智能 SLR 平台涵盖了主要的 SLR 步骤,包括研究方案设定、文献检索、摘要筛选、全文筛选、从全文文章中提取数据元素、结果总结和数据可视化。NLP 算法在文章筛选任务(框定为文本分类任务)中的准确率达到 0.86-0.90,在数据元素提取任务(框定为命名实体识别任务)中的宏平均 F1 得分达到 0.57-0.89。
最先进的 NLP 算法加速了观察性研究的 SLR,从而使科学家有更多的时间专注于观察性研究中数据的质量和证据的综合。与现有的 SLR 概念保持一致,该系统具有更新文献数据的潜力,使科学家能够轻松地前瞻性和持续地跟踪与观察性研究相关的文献。