Jackson David, Groth Paul, Harmouch Hazar
University of Amsterdam, Amsterdam, The Netherlands.
J Biomed Semantics. 2025 Aug 13;16(1):14. doi: 10.1186/s13326-025-00336-3.
BACKGROUND: Bioactive compounds found in foods and plants can provide health benefits, including antioxidant and anti-inflammatory effects. Research into their role in disease prevention and personalized nutrition is expanding, but challenges such as data complexity, inconsistent methods, and the rapid growth of scientific literature can hinder progress. To address these issues, we developed BASIL DB (BioActive Semantic Integration and Linking Database), a knowledge graph (KG) database that leverages natural language processing (NLP) techniques to streamline data organization and analysis. This automated approach offers greater scalability and comprehensiveness than traditional methods such as manual data curation and entry. CONSTRUCTION AND CONTENT: The process of constructing the BASIL DB is divided into four fundamental steps: data collection, data preprocessing, data extraction, and data integration. Data on bioactives and foods are sourced from structured databases. The relevant randomized controlled trials (RCTs) were extracted from PubMed. The data are then prepared by cleaning inconsistencies and structuring them for analysis. In the data extraction phase, NLP tools, including a large language model (LLM), are utilized to analyze clinical trials and extract data on bioactive compounds and their health impacts. The integration phase compiles these data into a knowledge graph, which consists of the entities Foods, Bioactives, and Health Conditions as nodes and their interactions as edges. To quantify the relationships/interactions between these entities, we generate a weight for each edge on the basis of empirical evidence and methodological rigor. UTILITY AND DISCUSSION: The BASIL DB incorporates 433 compounds, 40296 research papers, 7256 health effects, and 4197 food items. The database features query and visualization capabilities, including interactive graphs and custom filtering options, that showcase different aspects of the data. Users are able to explore the relationships between bioactives and health effects, enhancing both research efficiency and insight discovery. CONCLUSION: The BASIL DB is a knowledge graph database of bioactive compounds. This study provides a structured resource for exploring the relationships among bioactives, foods, and health outcomes, representing a step toward a more systematic and data-driven approach to understanding the health effects of bioactive compounds. Future work will focus on expanding the database and refining the utilized methods. Extending the BASIL DB will help bridge the gap between traditional and conventional approaches to nutrition, guiding future research in bioactive compound discovery and health optimization. AVAILABILITY: Users can access and explore the data via https://basil-db.github.io/info.html or fork and run the respective script via https://github.com/basil-db/script .
背景:在食物和植物中发现的生物活性化合物具有多种健康益处,包括抗氧化和抗炎作用。对其在疾病预防和个性化营养方面作用的研究正在不断扩展,但诸如数据复杂性、方法不一致以及科学文献快速增长等挑战可能会阻碍进展。为解决这些问题,我们开发了BASIL数据库(生物活性语义整合与链接数据库),这是一个利用自然语言处理(NLP)技术来简化数据组织和分析的知识图谱(KG)数据库。这种自动化方法比传统方法(如人工数据整理和录入)具有更高的可扩展性和全面性。 构建与内容:构建BASIL数据库的过程分为四个基本步骤:数据收集、数据预处理、数据提取和数据集成。生物活性物质和食物的数据来自结构化数据库。相关的随机对照试验(RCT)从PubMed中提取。然后通过清理不一致性并对数据进行结构化处理以进行分析。在数据提取阶段,利用包括大语言模型(LLM)在内的NLP工具来分析临床试验并提取生物活性化合物及其健康影响的数据。集成阶段将这些数据编译成一个知识图谱,该知识图谱由食物、生物活性物质和健康状况等实体作为节点,它们之间的相互作用作为边组成。为了量化这些实体之间的关系/相互作用,我们根据经验证据和方法的严谨性为每条边生成一个权重。 实用性与讨论:BASIL数据库包含433种化合物、40296篇研究论文、7256种健康影响和4197种食物。该数据库具有查询和可视化功能,包括交互式图表和自定义筛选选项,可展示数据的不同方面。用户能够探索生物活性物质与健康影响之间的关系,提高研究效率并发现新见解。 结论:BASIL数据库是一个生物活性化合物的知识图谱数据库。本研究为探索生物活性物质、食物和健康结果之间的关系提供了一个结构化资源,代表了朝着更系统、数据驱动的方法来理解生物活性化合物的健康影响迈出的一步。未来的工作将集中在扩展数据库和完善所使用的方法上。扩展BASIL数据库将有助于弥合传统营养方法与现代营养方法之间的差距,指导生物活性化合物发现和健康优化方面的未来研究。 可用性:用户可以通过https://basil-db.github.io/info.html访问和探索数据,或通过https://github.com/basil-db/script分叉并运行相应脚本。
J Biomed Semantics. 2025-8-13
2025-1
JBI Database System Rev Implement Rep. 2016-4
Cochrane Database Syst Rev. 2024-8-27
Cochrane Database Syst Rev. 2020-1-9
Cochrane Database Syst Rev. 2015-7-27
Cochrane Database Syst Rev. 2018-2-6
BMC Med Inform Decis Mak. 2025-3-7
Comput Struct Biotechnol J. 2024-10-17
Am J Epidemiol. 2024-11-4
J Med Internet Res. 2023-10-31
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2023-10-25
Front Nutr. 2023-7-17
Signal Transduct Target Ther. 2023-3-20
Sci Data. 2023-2-2
Methods Inf Med. 2022-12