Institute of Computing, University of Campinas, Campinas, SP, Brazil.
Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, NTNU - Norwegian University of Science and Technology, Ålesund, Norway.
BMC Med Inform Decis Mak. 2020 Dec 14;20(Suppl 4):314. doi: 10.1186/s12911-020-01341-5.
Knowledge is often produced from data generated in scientific investigations. An ever-growing number of scientific studies in several domains result into a massive amount of data, from which obtaining new knowledge requires computational help. For example, Alzheimer's Disease, a life-threatening degenerative disease that is not yet curable. As the scientific community strives to better understand it and find a cure, great amounts of data have been generated, and new knowledge can be produced. A proper representation of such knowledge brings great benefits to researchers, to the scientific community, and consequently, to society.
In this article, we study and evaluate a semi-automatic method that generates knowledge graphs (KGs) from biomedical texts in the scientific literature. Our solution explores natural language processing techniques with the aim of extracting and representing scientific literature knowledge encoded in KGs. Our method links entities and relations represented in KGs to concepts from existing biomedical ontologies available on the Web. We demonstrate the effectiveness of our method by generating KGs from unstructured texts obtained from a set of abstracts taken from scientific papers on the Alzheimer's Disease. We involve physicians to compare our extracted triples from their manual extraction via their analysis of the abstracts. The evaluation further concerned a qualitative analysis by the physicians of the generated KGs with our software tool.
The experimental results indicate the quality of the generated KGs. The proposed method extracts a great amount of triples, showing the effectiveness of our rule-based method employed in the identification of relations in texts. In addition, ontology links are successfully obtained, which demonstrates the effectiveness of the ontology linking method proposed in this investigation.
We demonstrate that our proposal is effective on building ontology-linked KGs representing the knowledge obtained from biomedical scientific texts. Such representation can add value to the research in various domains, enabling researchers to compare the occurrence of concepts from different studies. The KGs generated may pave the way to potential proposal of new theories based on data analysis to advance the state of the art in their research domains.
知识通常是从科学研究中产生的数据中获得的。在多个领域,越来越多的科学研究产生了大量的数据,从中获取新知识需要计算的帮助。例如,阿尔茨海默病是一种威胁生命的退行性疾病,目前还无法治愈。随着科学界努力更好地理解它并找到治疗方法,已经产生了大量的数据,并且可以产生新的知识。对这些知识进行适当的表示可以为研究人员、科学界,进而为社会带来巨大的利益。
在本文中,我们研究和评估了一种从科学文献中的生物医学文本中生成知识图(KG)的半自动方法。我们的解决方案探索了自然语言处理技术,旨在提取和表示 KG 中编码的科学文献知识。我们的方法将 KG 中表示的实体和关系链接到 Web 上现有生物医学本体中的概念。我们通过从一组关于阿尔茨海默病的科学论文的摘要中获取的非结构化文本生成 KGs,证明了我们方法的有效性。我们邀请医生通过对摘要的分析,比较他们从手动提取中提取的三元组。评估还包括医生对我们的软件工具生成的 KGs 的定性分析。
实验结果表明了生成 KGs 的质量。所提出的方法提取了大量的三元组,表明了我们在文本中识别关系的基于规则的方法的有效性。此外,成功获得了本体链接,这证明了本研究中提出的本体链接方法的有效性。
我们证明,我们的建议在构建代表从生物医学科学文本中获得的知识的本体链接 KG 方面是有效的。这种表示可以为各个领域的研究增加价值,使研究人员能够比较来自不同研究的概念的出现情况。生成的 KGs 可能为基于数据分析提出新理论铺平道路,以推进其研究领域的最新水平。