Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain.
Institute of Biomedical Engineering, Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, UK.
Adv Healthc Mater. 2023 Oct;12(25):e2300150. doi: 10.1002/adhm.202300150. Epub 2023 Aug 10.
Biomaterials research output has experienced an exponential increase over the last three decades. The majority of research is published in the form of scientific articles and is therefore available as unstructured text, making it a challenging input for computational processing. Computational tools are becoming essential to overcome this information overload. Among them, text mining systems present an attractive option for the automated extraction of information from text documents into structured datasets. This work presents the first automated system for biomaterial related information extraction from the National Library of Medicine's premier bibliographic database (MEDLINE) research abstracts into a searchable database. The system is a text mining pipeline that periodically retrieves abstracts from PubMed and identifies research and clinical studies of biomaterials. Thereafter, the pipeline identifies sixteen concept types of interest in the abstract using the Biomaterials Annotator, a tool for biomaterials Named Entity Recognition (NER). These concepts of interest, along with the abstract and relevant metadata are then deposited in DEBBIE, the Database of Experimental Biomaterials and their Biological Effect. DEBBIE is accessible through a web application that provides keyword searches and displays results in an intuitive and meaningful manner, aiming to facilitate an efficient mapping and organization of biomaterials information.
在过去的三十年中,生物材料研究成果呈指数级增长。大多数研究都是以科学文章的形式发表的,因此是无结构的文本,这使得它成为计算处理的一个挑战。计算工具正成为克服这种信息过载的必要手段。其中,文本挖掘系统为从文本文件中自动提取信息到结构化数据集提供了一种有吸引力的选择。这项工作展示了第一个从美国国家医学图书馆的主要书目数据库(MEDLINE)研究摘要中自动提取生物材料相关信息的系统,并将其构建成一个可搜索的数据库。该系统是一个文本挖掘管道,定期从 PubMed 中检索摘要,并识别生物材料的研究和临床研究。此后,该管道使用 Biomaterials Annotator(一种用于生物材料命名实体识别(NER)的工具)识别摘要中十六个感兴趣的概念类型。然后,将这些感兴趣的概念,以及摘要和相关元数据,存储在 DEBBIE(实验生物材料及其生物学效应数据库)中。DEBBIE 可以通过一个网络应用程序访问,该应用程序提供关键字搜索,并以直观和有意义的方式显示结果,旨在促进生物材料信息的高效映射和组织。