Alghamdi Dalia, Dooley Damion M, Samman Mannar, AlFaiz Ali, Hsiao William W L
Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BC Cancer, Vancouver, BC V5T 4S6, Canada.
Centre for Infectious Disease Genomics and One Health (CIDGOH), Faculty of Health Sciences, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
Bioinform Adv. 2025 Aug 7;5(1):vbaf131. doi: 10.1093/bioadv/vbaf131. eCollection 2025.
With improvements in high throughput sequencing technologies and the constant generation of large biomedical datasets, biobanks increasingly take on the role of managing and delivering not just specimens but also specimen-derived data and associated contextual data. However, reusing data from different biobanks is challenged by incompatible data representations. Contextual data describing biobank resources often contains unstructured textual information incompatible with computational processes such as automated data discovery and integration. Therefore, a consistent and comprehensive contextual data framework is needed to increase discovery, reusability, and integrability across data sources.
The next generation biobanking ontology is an open-source application ontology representing omics contextual data, licensed under the Creative Commons 4.0 license. The ontology focuses on capturing information about three main activities: wet bench analysis used to generate omics data, bioinformatics analysis used to process and interpret data, and data management. In this paper, we demonstrated the use of the ontology to add semantic statements to real-life use cases and query data previously stored in unstructured textual format.
NGBO is freely available at https://github.com/Dalalghamdi/NGBO, and accessible from OLS https://www.ebi.ac.uk/ols4/ontologies/ngbo.
随着高通量测序技术的改进以及大型生物医学数据集的不断产生,生物样本库不仅越来越多地承担起管理和提供样本的责任,还包括管理和提供源自样本的数据及相关背景数据。然而,不同生物样本库数据的重复使用面临着数据表示不兼容的挑战。描述生物样本库资源的背景数据通常包含与自动化数据发现和整合等计算过程不兼容的非结构化文本信息。因此,需要一个一致且全面的背景数据框架来提高跨数据源的发现、可重复使用性和可整合性。
下一代生物样本库本体是一个表示组学背景数据的开源应用本体,遵循知识共享4.0许可协议。该本体专注于捕获有关三个主要活动的信息:用于生成组学数据的湿实验室分析、用于处理和解释数据的生物信息学分析以及数据管理。在本文中,我们展示了使用该本体为实际用例添加语义声明以及查询以前以非结构化文本格式存储的数据。
NGBO可在https://github.com/Dalalghamdi/NGBO上免费获取,并可从OLS(https://www.ebi.ac.uk/ols4/ontologies/ngbo)访问。