Institute of Biomedical Engineering, Botnar Research Centre, Nuffield Orthopaedic Centre, University of Oxford, Oxford OX3 7LD, UK.
Institute of Social Research, University of Michigan, MI 48104, USA.
N Biotechnol. 2023 Nov 25;77:161-175. doi: 10.1016/j.nbt.2023.09.001. Epub 2023 Sep 4.
Scientific information extraction is fundamental for research and innovation, but is currently mostly a manual, time-consuming process. Text Mining tools (TMTs) enable automated, accurate and quick information extraction from text, but there is little precedent of their use in the biomaterials field. Here, we compare the ability of various TMTs to extract useful information from biomaterials abstracts. Focusing on the biocompatibility of polydioxanone, a biodegradable polymer for which there are relatively few scientific publications, we tested several tools ranging from machine learning approaches and statistical text analysis to MeSH indexing and domain-specific semantic tools for Named Entity Recognition. We also evaluated their output alongside a manual review of systematic reviews and meta-analyses. The findings show that TMTs can be highly efficient and powerful for mapping biomaterials texts and rapidly yield up-to-date information. Here, TMTs enable one to identify dominating themes, see the evolution of specific terms and topics, and learn about key medical applications in biomaterials literature over the years. The analysis also shows that ambiguity around biomaterials nomenclature is a significant challenge in mining biomedical literature that is yet to be tackled. This research showcases the potential value of using Natural Language Processing and domain-specific tools to extract and organize biomaterials data.
科学信息提取对于研究和创新至关重要,但目前主要是一个手动、耗时的过程。文本挖掘工具(TMTs)可以从文本中自动、准确、快速地提取信息,但在生物材料领域几乎没有使用它们的先例。在这里,我们比较了各种 TMT 从生物材料摘要中提取有用信息的能力。我们专注于聚二氧杂环己酮的生物相容性,这是一种生物可降解聚合物,其相关科学出版物相对较少,我们测试了几种工具,包括机器学习方法和统计文本分析,以及 MeSH 索引和特定于领域的命名实体识别语义工具。我们还评估了它们的输出,同时对系统评价和荟萃分析进行了手动审查。研究结果表明,TMTs 可以非常高效和强大,用于绘制生物材料文本,并快速提供最新信息。在这里,TMTs 可以帮助人们识别主导主题,了解特定术语和主题的演变,并了解多年来生物材料文献中的关键医学应用。该分析还表明,生物材料命名法的模糊性是挖掘生物医学文献中的一个重大挑战,尚未得到解决。这项研究展示了使用自然语言处理和特定于领域的工具提取和组织生物材料数据的潜在价值。