Dockès Jérome, Oudyk Kendra M, Torabi Mohammad, de la Vega Alejandro I, Poline Jean-Baptiste
National Institute for Research in Digital Science and Technology (INRIA), Paris, France.
Montreal Neurological Institute, McGill University, Montreal, Canada.
Elife. 2025 Sep 11;13:RP94909. doi: 10.7554/eLife.94909.
Automated analysis of the biomedical literature () offers a rich source of insights. However, such analysis requires collecting a large number of articles and extracting and processing their content. This task is often prohibitively difficult and time-consuming. Here, we provide tools to easily collect, process, and annotate the biomedical literature. In particular, https://neuroquery.github.io/pubget/pubget.html is an efficient and reliable command-line tool for downloading articles in bulk from PubMed Central, extracting their contents and metadata into convenient formats, and extracting and analyzing information such as stereotactic brain coordinates. https://jeromedockes.github.io/labelbuddy/labelbuddy/current/ is a lightweight local application for annotating text, which facilitates the extraction of complex information or the creation of ground-truth labels to validate automated information extraction methods. Further, we describe repositories where researchers can share their analysis code and their manual annotations in a format that facilitates reuse. These resources can help streamline text mining and meta-science projects and make text mining of the biomedical literature more accessible, effective, and reproducible. We describe a typical workflow based on these tools and illustrate it with several example projects.
对生物医学文献进行自动化分析可提供丰富的见解来源。然而,这种分析需要收集大量文章并提取和处理其内容。这项任务通常极其困难且耗时。在此,我们提供了一些工具,可轻松收集、处理和注释生物医学文献。具体而言,https://neuroquery.github.io/pubget/pubget.html是一个高效且可靠的命令行工具,用于从美国国立医学图书馆生物医学文献数据库批量下载文章,将其内容和元数据提取为方便的格式,并提取和分析诸如立体定向脑坐标等信息。https://jeromedockes.github.io/labelbuddy/labelbuddy/current/是一个用于注释文本的轻量级本地应用程序,它有助于提取复杂信息或创建用于验证自动信息提取方法的真实标签。此外,我们还介绍了一些存储库,研究人员可以在其中以便于重用的格式共享他们的分析代码和手动注释。这些资源有助于简化文本挖掘和元科学项目,并使生物医学文献的文本挖掘更易于访问、更有效且可重复。我们描述了基于这些工具的典型工作流程,并通过几个示例项目进行说明。