Rensselaer Polytechnic Institute, Troy, NY.
IBM Research, Cambridge, MA.
AMIA Annu Symp Proc. 2021 Jan 25;2020:462-471. eCollection 2020.
When healthcare providers review the results of a clinical trial study to understand its applicability to their practice, they typically analyze how well the characteristics of the study cohort correspond to those of the patients they see. We have previously created a study cohort ontology to standardize this information and make it accessible for knowledge-based decision support. The extraction of this information from research publications is challenging, however, given the wide variance in reporting cohort characteristics in a tabular representation. To address this issue, we have developed an ontology-enabled knowledge extraction pipeline for automatically constructing knowledge graphs from the cohort characteristics found in PDF-formatted research papers. We evaluated our approach using a training and test set of 41 research publications and found an overall accuracy of 83.3% in correctly assembling the knowledge graphs. Our research provides a promising approach for extracting knowledge more broadly from tabular information in research publications.
当医疗保健提供者审查临床试验研究的结果以了解其在实践中的适用性时,他们通常会分析研究队列的特征与他们所看到的患者的特征的吻合程度。我们之前创建了一个研究队列本体,以标准化这些信息,并使其可用于基于知识的决策支持。然而,由于在表格表示中报告队列特征的差异很大,因此从研究出版物中提取这些信息具有挑战性。为了解决这个问题,我们开发了一个本体启用的知识提取管道,用于从 PDF 格式的研究论文中找到的队列特征自动构建知识图谱。我们使用包含 41 篇研究出版物的训练集和测试集评估了我们的方法,发现正确组装知识图谱的整体准确率为 83.3%。我们的研究为更广泛地从研究出版物中的表格信息中提取知识提供了一种很有前途的方法。