Tirlet Yael, Boudet Matéo, Becker Emmanuelle, Legeai Fabrice, Dameron Olivier
Univ Rennes, Inria, CNRS, IRISA, 35000, Rennes, France.
IGEPP, INRAE, Institut Agro, Univ Rennes, 35653, Le Rheu, France.
Comput Struct Biotechnol J. 2024 Nov 19;23:4232-4241. doi: 10.1016/j.csbj.2024.11.022. eCollection 2024 Dec.
The expansion of multi-omics datasets raises significant challenges for data integration and querying. To overcome these challenges, we developed a generic RDF-based integration schema that connects various types of differential -omics data, epigenomics, and regulatory information. This schema employs the FALDO ontology to enable querying based on genomic locations. It is designed to be fully or partially populated, providing both flexibility and extensibility while supporting complex queries. We validated the schema by reproducing two recently published studies, one in biomedicine and the other in environmental science, proving its genericity and its ability to integrate data efficiently. This schema serves as an effective tool for managing and querying a wide range of multi-omics datasets.
多组学数据集的扩展给数据整合和查询带来了重大挑战。为了克服这些挑战,我们开发了一种基于资源描述框架(RDF)的通用整合模式,该模式连接了各种类型的差异组学数据、表观基因组学和调控信息。该模式采用FALDO本体来实现基于基因组位置的查询。它被设计为可以完全或部分填充,在支持复杂查询的同时提供灵活性和可扩展性。我们通过重现最近发表的两项研究(一项在生物医学领域,另一项在环境科学领域)对该模式进行了验证,证明了其通用性和高效整合数据的能力。该模式是管理和查询广泛的多组学数据集的有效工具。