Bonatti Amedeo Franco, Chiarello Filippo, Vozzi Giovanni, De Maria Carmelo
Department of Information Engineering and Research Center "Enrico Piaggio,", Systems, Territory and Construction Engineering, University of Pisa, Pisa, Italy.
Department of Energy, Systems, Territory and Construction Engineering, University of Pisa, Pisa, Italy.
3D Print Addit Manuf. 2024 Aug 20;11(4):1495-1509. doi: 10.1089/3dp.2022.0316. eCollection 2024 Aug.
Bioprinting is a rapidly evolving field, as represented by the exponential growth of articles and reviews published each year on the topic. As the number of publications increases, there is a need for an automatic tool that can help researchers do more comprehensive literature analysis, standardize the nomenclature, and so accelerate the development of novel manufacturing techniques and materials for the field. In this context, we propose an automatic keyword annotation model, based on Natural Language Processing (NLP) techniques, that can be used to find insights in the bioprinting scientific literature. The approach is based on two main data sources, the abstracts and related author keywords, which are used to train a composite model based on (i) an embeddings part (using the FastText algorithm), which generates word vectors for an input keyword, and (ii) a classifier part (using the Support Vector Machine algorithm), to label the keyword based on its word vector into a manufacturing technique, employed material, or application of the bioprinted product. The composite model was trained and optimized based on a two-stage optimization procedure to yield the best classification performance. The annotated author keywords were then reprojected on the abstract collection to both generate a lexicon of the bioprinting field and extract relevant information, like technology trends and the relationship between manufacturing-material-application. The proposed approach can serve as a basis for more complex NLP-related analysis toward the automated analysis of the bioprinting literature.
生物打印是一个快速发展的领域,每年关于该主题发表的文章和综述呈指数级增长就是明证。随着出版物数量的增加,需要一种自动化工具来帮助研究人员进行更全面的文献分析、规范术语,并因此加速该领域新型制造技术和材料的开发。在此背景下,我们提出一种基于自然语言处理(NLP)技术的自动关键词标注模型,可用于从生物打印科学文献中挖掘见解。该方法基于两个主要数据源,即摘要和相关作者关键词,用于训练一个复合模型,该模型基于:(i)一个嵌入部分(使用FastText算法),为输入关键词生成词向量;(ii)一个分类器部分(使用支持向量机算法),根据词向量将关键词标记为制造技术、所用材料或生物打印产品的应用。基于两阶段优化程序对复合模型进行训练和优化,以产生最佳分类性能。然后将带注释的作者关键词重新投影到摘要集合上,以生成生物打印领域的词汇表并提取相关信息,如技术趋势以及制造 - 材料 - 应用之间的关系。所提出的方法可作为对生物打印文献进行自动化分析的更复杂NLP相关分析的基础。