Suppr超能文献

用于探索疾病模型小鼠的联邦SPARQL查询性能评估:结合基因表达、直系同源和疾病知识图谱

Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs.

作者信息

Kushida Tatsuya, de Farias Tarcisio Mendes, Sima Ana C, Dessimoz Christophe, Chiba Hirokazu, Bastian Frederic B, Masuya Hiroshi

机构信息

BioResource Research Center, RIKEN, Tsukuba-shi, Japan.

SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.

出版信息

BMC Med Inform Decis Mak. 2025 May 16;25(Suppl 1):189. doi: 10.1186/s12911-025-03013-8.

Abstract

BACKGROUND

The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database).

METHODS

This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications.

RESULTS

We illustrate the above through two use cases targeting either Alzheimer's disease or melanoma. We identified 14 Alzheimer's disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer's disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the "skin of limb" as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase.

CONCLUSIONS

As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.

摘要

背景

日本理化学研究所生物资源中心(RIKEN BRC)开发并维护了理化学研究所生物资源元数据库,以帮助用户为其实验探索合适的目标生物资源,并准备精确且高质量的数据基础设施。瑞士生物信息学研究所开发了两个跨多物种的数据库,用于基因表达和直系同源性研究:Bgee和直系同源矩阵(OMA,一个直系同源数据库)。

方法

本研究将理化学研究所生物资源数据与来自Bgee(一个基因表达数据库)、OMA、DisGeNET(一个人类基因 - 疾病关联数据库)、小鼠基因组信息学(MGI)、UniProt以及理化学研究所生物资源元数据库中的四个疾病本体的资源描述框架(RDF)数据集相结合。我们的目标是在探索哪些模式生物最适合跨上述可互操作数据集的特定医学研究应用时,评估分布式SPARQL查询性能。更确切地说,在我们的生物医学用例中,我们研究与疾病相关的基因,以及这些基因表达的解剖部位,随后确定可用于特定疾病研究应用的合适生物资源候选者。

结果

我们通过针对阿尔茨海默病或黑色素瘤的两个用例说明了上述情况。我们确定了14个在前额叶皮质中表达的与阿尔茨海默病相关的基因(例如APP和APOE)以及55个理化学研究所生物资源,这些资源是与这些基因相关的基因编辑小鼠,预计与阿尔茨海默病研究相关。此外,通过使用属性路径函数对Uberon术语进行传递搜索,我们确定了14个与黑色素瘤相关的基因(例如HRAS和PTEN)以及这些基因表达的12个解剖部位,例如以“肢体皮肤”为例。最后,我们将通过远程Bgee SPARQL端点进行的联合SPARQL查询性能与使用作为理化学研究所生物资源元数据库一部分的Bgee数据集进行的集中式SPARQL查询性能进行了比较。

结论

结果,我们确认联合方法的性能下降。我们得出结论,通过子查询优化传输的数据并增强服务器规格,从而优化三元组存储查询评估,我们减少了从生物资源元数据库到瑞士生物信息学研究所(SIB)的联合方法查询性能的下降。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0895/12082848/622cdc3875b8/12911_2025_3013_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验