• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用安全、持续更新的网络源处理管道来支持科学文献的实时数据合成与分析:开发与验证研究。

Using a Secure, Continually Updating, Web Source Processing Pipeline to Support the Real-Time Data Synthesis and Analysis of Scientific Literature: Development and Validation Study.

作者信息

Vaghela Uddhav, Rabinowicz Simon, Bratsos Paris, Martin Guy, Fritzilas Epameinondas, Markar Sheraz, Purkayastha Sanjay, Stringer Karl, Singh Harshdeep, Llewellyn Charlie, Dutta Debabrata, Clarke Jonathan M, Howard Matthew, Serban Ovidiu, Kinross James

机构信息

PanSurg Collaborative, Department of Surgery and Cancer, Imperial College London, London, United Kingdom.

Amazon Web Services UK Limited, London, United Kingdom.

出版信息

J Med Internet Res. 2021 May 6;23(5):e25714. doi: 10.2196/25714.

DOI:10.2196/25714
PMID:33835932
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8104004/
Abstract

BACKGROUND

The scale and quality of the global scientific response to the COVID-19 pandemic have unquestionably saved lives. However, the COVID-19 pandemic has also triggered an unprecedented "infodemic"; the velocity and volume of data production have overwhelmed many key stakeholders such as clinicians and policy makers, as they have been unable to process structured and unstructured data for evidence-based decision making. Solutions that aim to alleviate this data synthesis-related challenge are unable to capture heterogeneous web data in real time for the production of concomitant answers and are not based on the high-quality information in responses to a free-text query.

OBJECTIVE

The main objective of this project is to build a generic, real-time, continuously updating curation platform that can support the data synthesis and analysis of a scientific literature framework. Our secondary objective is to validate this platform and the curation methodology for COVID-19-related medical literature by expanding the COVID-19 Open Research Dataset via the addition of new, unstructured data.

METHODS

To create an infrastructure that addresses our objectives, the PanSurg Collaborative at Imperial College London has developed a unique data pipeline based on a web crawler extraction methodology. This data pipeline uses a novel curation methodology that adopts a human-in-the-loop approach for the characterization of quality, relevance, and key evidence across a range of scientific literature sources.

RESULTS

REDASA (Realtime Data Synthesis and Analysis) is now one of the world's largest and most up-to-date sources of COVID-19-related evidence; it consists of 104,000 documents. By capturing curators' critical appraisal methodologies through the discrete labeling and rating of information, REDASA rapidly developed a foundational, pooled, data science data set of over 1400 articles in under 2 weeks. These articles provide COVID-19-related information and represent around 10% of all papers about COVID-19.

CONCLUSIONS

This data set can act as ground truth for the future implementation of a live, automated systematic review. The three benefits of REDASA's design are as follows: (1) it adopts a user-friendly, human-in-the-loop methodology by embedding an efficient, user-friendly curation platform into a natural language processing search engine; (2) it provides a curated data set in the JavaScript Object Notation format for experienced academic reviewers' critical appraisal choices and decision-making methodologies; and (3) due to the wide scope and depth of its web crawling method, REDASA has already captured one of the world's largest COVID-19-related data corpora for searches and curation.

摘要

背景

全球科学界对新冠疫情的应对规模和质量无疑挽救了许多生命。然而,新冠疫情也引发了一场前所未有的“信息疫情”;数据产生的速度和数量让许多关键利益相关者不堪重负,比如临床医生和政策制定者,因为他们无法处理结构化和非结构化数据以进行基于证据的决策。旨在缓解与数据综合相关挑战的解决方案无法实时捕捉异构网络数据以生成相应答案,且并非基于对自由文本查询的高质量回复信息。

目的

本项目的主要目标是构建一个通用的、实时的、持续更新的管理平台,该平台能够支持科学文献框架的数据综合与分析。我们的次要目标是通过添加新的非结构化数据来扩展新冠开放研究数据集,从而验证该平台以及针对新冠相关医学文献的管理方法。

方法

为创建一个能实现我们目标的基础设施,伦敦帝国理工学院的泛外科协作组基于网络爬虫提取方法开发了一种独特的数据管道。此数据管道采用一种新颖的管理方法,该方法采用人工参与的方式来表征一系列科学文献来源中的质量、相关性和关键证据。

结果

REDASA(实时数据综合与分析)现已成为全球最大且最新的新冠相关证据来源之一;它包含104,000份文档。通过对信息进行离散标记和评级来捕捉管理人员的关键评估方法,REDASA在不到两周的时间内迅速开发出了一个基础的、汇总的、包含1400多篇文章的数据科学数据集。这些文章提供了与新冠相关的信息,约占所有关于新冠论文的10%。

结论

该数据集可作为未来实施实时自动系统综述的依据。REDASA设计的三个优点如下:(1)它通过将高效、用户友好的管理平台嵌入自然语言处理搜索引擎,采用了用户友好的人工参与方法;(2)它以JavaScript对象表示法格式提供了一个经过整理的数据集,用于展示经验丰富的学术评审人员的关键评估选择和决策方法;(3)由于其网络爬虫方法的广泛范围和深度,REDASA已经捕获了全球最大的新冠相关数据语料库之一用于搜索和管理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/6e79c8100937/jmir_v23i5e25714_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/538862e16a93/jmir_v23i5e25714_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/9324a325838d/jmir_v23i5e25714_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/dbb7dfd97bdc/jmir_v23i5e25714_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/f252e0471017/jmir_v23i5e25714_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/dd53a7067892/jmir_v23i5e25714_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/712499e4c5c1/jmir_v23i5e25714_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/6e79c8100937/jmir_v23i5e25714_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/538862e16a93/jmir_v23i5e25714_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/9324a325838d/jmir_v23i5e25714_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/dbb7dfd97bdc/jmir_v23i5e25714_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/f252e0471017/jmir_v23i5e25714_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/dd53a7067892/jmir_v23i5e25714_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/712499e4c5c1/jmir_v23i5e25714_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d59a/8104004/6e79c8100937/jmir_v23i5e25714_fig7.jpg

相似文献

1
Using a Secure, Continually Updating, Web Source Processing Pipeline to Support the Real-Time Data Synthesis and Analysis of Scientific Literature: Development and Validation Study.使用安全、持续更新的网络源处理管道来支持科学文献的实时数据合成与分析:开发与验证研究。
J Med Internet Res. 2021 May 6;23(5):e25714. doi: 10.2196/25714.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
4
Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树:影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。
Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.
5
Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.使用图查询搜索 COVID-19 临床研究:算法开发与验证。
J Med Internet Res. 2024 May 30;26:e52655. doi: 10.2196/52655.
6
neXtA5: accelerating annotation of articles via automated approaches in neXtProt.neXtA5:通过neXtProt中的自动化方法加速文章注释。
Database (Oxford). 2016 Jul 3;2016. doi: 10.1093/database/baw098. Print 2016.
7
Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search.自然语言搜索界面:健康数据需要单字段变量搜索。
J Med Internet Res. 2016 Jan 14;18(1):e13. doi: 10.2196/jmir.4912.
8
Information-Seeking Patterns During the COVID-19 Pandemic Across the United States: Longitudinal Analysis of Google Trends Data.美国新冠疫情期间的信息寻求模式:谷歌趋势数据的纵向分析
J Med Internet Res. 2021 May 3;23(5):e22933. doi: 10.2196/22933.
9
Framework for Managing the COVID-19 Infodemic: Methods and Results of an Online, Crowdsourced WHO Technical Consultation.管理新冠疫情信息疫情的框架:世卫组织在线众包技术磋商会的方法与结果
J Med Internet Res. 2020 Jun 26;22(6):e19659. doi: 10.2196/19659.
10
The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review.综合护理路径在医疗环境中对成人和儿童的有效性:一项系统评价。
JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001.

引用本文的文献

1
Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed.生物医学文献系统评价自动化:PubMed 索引研究的范围综述。
Syst Rev. 2024 Jul 8;13(1):174. doi: 10.1186/s13643-024-02592-3.

本文引用的文献

1
Artificial-intelligence tools aim to tame the coronavirus literature.人工智能工具旨在梳理新冠病毒相关文献。
Nature. 2020 Jun 9. doi: 10.1038/d41586-020-01733-7.
2
Crawling the German Health Web: Exploratory Study and Graph Analysis.爬取德国健康网站:探索性研究与图谱分析。
J Med Internet Res. 2020 Jul 24;22(7):e17853. doi: 10.2196/17853.
3
A living systematic review protocol for COVID-19 clinical trial registrations.一项关于COVID-19临床试验注册的实时系统评价方案。
Wellcome Open Res. 2020 Apr 2;5:60. doi: 10.12688/wellcomeopenres.15821.1. eCollection 2020.
4
Tools to Assess the Trustworthiness of Evidence-Based Point-of-Care Information for Health Care Professionals: Systematic Review.用于评估医疗保健专业人员基于证据的即时医疗信息可信度的工具:系统评价
J Med Internet Res. 2020 Jan 17;22(1):e15415. doi: 10.2196/15415.
5
Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap.实时系统评价:缩小证据-实践差距的新契机。
PLoS Med. 2014 Feb 18;11(2):e1001603. doi: 10.1371/journal.pmed.1001603. eCollection 2014 Feb.