Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19713, USA.
Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19716, USA.
BMC Bioinformatics. 2024 Jun 13;25(1):213. doi: 10.1186/s12859-024-05812-8.
Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale.
This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community.
Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .
自动化假设生成(HG)专注于揭示公共领域中广泛信息内隐藏的联系。由于现代机器学习算法的出现,这个领域变得越来越流行。然而,自动化的 HG 系统评估仍然是一个开放的问题,特别是在更大的规模上。
本文提出了一种新颖的基准框架 Dyport,用于评估生物医学假设生成系统。我们的方法利用经过精心整理的数据集,在现实条件下测试这些系统,提高了评估的相关性。我们将来自经过整理的数据库的知识集成到一个动态图中,并结合一种方法来量化发现的重要性。这不仅评估了假设的准确性,还评估了它们在生物医学研究中的潜在影响,这大大扩展了传统的链接预测基准。我们的基准测试过程在几个链接预测系统上进行了应用,这些系统应用于生物医学语义知识图。我们的基准测试系统具有灵活性,旨在验证假设生成质量,旨在扩大生物医学研究界的科学发现范围。
Dyport 是一个用于生物医学假设生成系统评估的开源基准框架,它考虑了知识动态、语义和影响。所有的代码和数据集都可以在:https://github.com/IlyaTyagin/Dyport 上获取。