Dorr Ricardo A, Casal Juan J, Toriano Roxana
Facultad de Medicina, Instituto de Fisiología y Biofísica Bernardo Houssay (IFIBIO Houssay), CONICET-Universidad de Buenos Aires, Buenos Aires, Argentina.
Healthc Inform Res. 2022 Jul;28(3):276-283. doi: 10.4258/hir.2022.28.3.276. Epub 2022 Jul 31.
Automated systems for information extraction are becoming very useful due to the enormous scale of the existing literature and the increasing number of scientific articles published worldwide in the field of medicine. We aimed to develop an accessible method using the open-source platform KNIME to perform text mining (TM) on indexed publications. Material from scientific publications in the field of life sciences was obtained and integrated by mining information on hemolytic uremic syndrome (HUS) as a case study.
Text retrieved from Europe PubMed Central (PMC) was processed using specific KNIME nodes. The results were presented in the form of tables or graphical representations. Data could also be compared with those from other sources.
By applying TM to the scientific literature on HUS as a case study, and by selecting various fields from scientific articles, it was possible to obtain a list of individual authors of publications, build bags of words and study their frequency and temporal use, discriminate topics (HUS vs. atypical HUS) in an unsupervised manner, and cross-reference information with a list of FDA-approved drugs.
Following the instructions in the tutorial, researchers without programming skills can successfully perform TM on the indexed scientific literature. This methodology, using KNIME, could become a useful tool for performing statistics, analyzing behaviors, following trends, and making forecast related to medical issues. The advantages of TM using KNIME include enabling the integration of scientific information, helping to carry out reviews, and optimizing the management of resources dedicated to basic and clinical research.
由于现有文献规模巨大且全球医学领域发表的科学文章数量不断增加,信息提取自动化系统变得非常有用。我们旨在开发一种使用开源平台KNIME的可访问方法,对索引出版物进行文本挖掘(TM)。作为案例研究,通过挖掘溶血尿毒综合征(HUS)的信息,获取并整合了生命科学领域科学出版物的材料。
使用特定的KNIME节点处理从欧洲 PubMed 中心(PMC)检索到的文本。结果以表格或图形表示的形式呈现。数据也可以与其他来源的数据进行比较。
通过将TM应用于以HUS为案例研究的科学文献,并从科学文章中选择各个字段,可以获得出版物的个人作者列表,构建词袋并研究其频率和时间使用情况,可以以无监督方式区分主题(HUS与非典型HUS),并将信息与FDA批准的药物列表进行交叉引用。
按照教程中的说明,没有编程技能的研究人员可以成功地对索引科学文献进行TM。这种使用KNIME的方法可能成为进行统计、分析行为、跟踪趋势以及对医学问题进行预测的有用工具。使用KNIME进行TM的优点包括能够整合科学信息、帮助进行综述以及优化用于基础研究和临床研究资源的管理