Suppr超能文献

基于大规模共现模式识别蛋白质功能和功能链接。

Identifying protein function and functional links based on large-scale co-occurrence patterns.

机构信息

Division of Identification and Forensic Science, Israel Police, Jerusalem, Israel.

Faculty of Management of Technology, Holon Institute of Technology, Holon, Israel.

出版信息

PLoS One. 2022 Mar 3;17(3):e0264765. doi: 10.1371/journal.pone.0264765. eCollection 2022.

Abstract

OBJECTIVE

The vast majority of known proteins have not been experimentally tested even at the level of measuring their expression, and the function of many proteins remains unknown. In order to decipher protein function and examine functional associations, we developed "Cliquely", a software tool based on the exploration of co-occurrence patterns.

COMPUTATIONAL MODEL

Using a set of more than 23 million proteins divided into 404,947 orthologous clusters, we explored the co-occurrence graph of 4,742 fully sequenced genomes from the three domains of life. Edge weights in this graph represent co-occurrence probabilities. We use the Bron-Kerbosch algorithm to detect maximal cliques in this graph, fully-connected subgraphs that represent meaningful biological networks from different functional categories.

MAIN RESULTS

We demonstrate that Cliquely can successfully identify known networks from various pathways, including nitrogen fixation, glycolysis, methanogenesis, mevalonate and ribosome proteins. Identifying the virulence-associated type III secretion system (T3SS) network, Cliquely also added 13 previously uncharacterized novel proteins to the T3SS network, demonstrating the strength of this approach. Cliquely is freely available and open source. Users can employ the tool to explore co-occurrence networks using a protein of interest and a customizable level of stringency, either for the entire dataset or for a one of the three domains-Archaea, Bacteria, or Eukarya.

摘要

目的

绝大多数已知蛋白质甚至在测量其表达水平的层面上都尚未经过实验测试,许多蛋白质的功能仍然未知。为了解码蛋白质功能并检查功能关联,我们开发了“Cliquely”,这是一款基于探索共现模式的软件工具。

计算模型

我们使用了一组超过 2300 万个蛋白质,分为 404947 个直系同源簇,探索了来自生命三个领域的 4742 个全测序基因组的共现图。该图中的边权重表示共现概率。我们使用 Bron-Kerbosch 算法来检测该图中的极大团,这是代表来自不同功能类别的有意义生物网络的全连通子图。

主要结果

我们证明 Cliquely 可以成功识别来自各种途径的已知网络,包括固氮、糖酵解、甲烷生成、甲羟戊酸和核糖体蛋白。鉴定出与毒力相关的 III 型分泌系统 (T3SS) 网络后,Cliquely 还将 13 个以前未表征的新蛋白添加到 T3SS 网络中,证明了这种方法的有效性。Cliquely 是免费提供的,并且是开源的。用户可以使用该工具使用感兴趣的蛋白质和可自定义的严格程度来探索共现网络,无论是针对整个数据集还是针对三个领域(古菌、细菌或真核生物)之一。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9506/8893610/9501fe829139/pone.0264765.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验