Univ Rennes, Inria, CNRS, IRISA-UMR 6074, F-35000 Rennes, France.
PEGASE, INRAE, Institut Agro, F-35590 Saint Gilles, France.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad257.
Molecular complexes play a major role in the regulation of biological pathways. The Biological Pathway Exchange format (BioPAX) facilitates the integration of data sources describing interactions some of which involving complexes. The BioPAX specification explicitly prevents complexes to have any component that is another complex (unless this component is a black-box complex whose composition is unknown). However, we observed that the well-curated Reactome pathway database contains such recursive complexes of complexes. We propose reproductible and semantically rich SPARQL queries for identifying and fixing invalid complexes in BioPAX databases, and evaluate the consequences of fixing these nonconformities in the Reactome database.
For the Homo sapiens version of Reactome, we identify 5833 recursively defined complexes out of the 14 987 complexes (39%). This situation is not specific to the Human dataset, as all tested species of Reactome exhibit between 30% (Plasmodium falciparum) and 40% (Sus scrofa, Bos taurus, Canis familiaris, and Gallus gallus) of recursive complexes. As an additional consequence, the procedure also allows the detection of complex redundancies. Overall, this method improves the conformity and the automated analysis of the graph by repairing the topology of the complexes in the graph. This will allow to apply further reasoning methods on better consistent data.
We provide a Jupyter notebook detailing the analysis https://github.com/cjuigne/non_conformities_detection_biopax.
分子复合物在生物途径的调控中起着重要作用。生物途径交换格式(BioPAX)促进了描述相互作用的数据源的整合,其中一些相互作用涉及复合物。BioPAX 规范明确禁止复合物具有任何其他复合物的组件(除非该组件是一个未知组成的黑盒复合物)。然而,我们观察到,精心策划的 Reactome 途径数据库包含了这种复合物的递归。我们提出了可重复和语义丰富的 SPARQL 查询,用于识别和修复 BioPAX 数据库中的无效复合物,并评估在 Reactome 数据库中修复这些不一致性的后果。
对于 Homo sapiens 版本的 Reactome,我们从 14987 个复合物中识别出 5833 个递归定义的复合物(39%)。这种情况不仅限于人类数据集,因为所有测试的 Reactome 物种都表现出 30%(疟原虫)到 40%(野猪、牛、犬和鸡)的递归复合物。作为另一个结果,该过程还允许检测复合物冗余。总体而言,该方法通过修复图中复合物的拓扑结构,提高了图的一致性和自动分析。这将允许在更一致的数据上应用进一步的推理方法。
我们提供了一个详细说明分析的 Jupyter 笔记本https://github.com/cjuigne/non_conformities_detection_biopax。