一种用于糖尿病研究的蛋白质数据整合的多层方法。

A multi-layered approach to protein data integration for diabetes research.

作者信息

McGarry Ken, Chambers James, Oatley Giles

机构信息

School of Pharmacy, University of Sunderland, Wharncliffe Street, Sunderland SR1 3SD, UK.

出版信息

Artif Intell Med. 2007 Oct;41(2):129-43. doi: 10.1016/j.artmed.2007.07.009. Epub 2007 Sep 14.

DOI:10.1016/j.artmed.2007.07.009

PMID:17869073

Abstract

OBJECTIVE

Recent advances in high-throughput experimental techniques have enabled many protein-protein interactions to be identified and stored in large databases. Understanding protein interactions is fundamental to the advancement of science and medical knowledge, unfortunately the scale of the requires an automated approach to analysis. We describe our graph-mining techniques to identify important structures within protein-protein interaction networks to aid in human comprehension and computerised analysis.

METHODS AND MATERIALS

We describe our techniques for characterizing graph type and associated properties which is constructed from data collated from the Human Protein Reference Database. Using random graph rewiring comparative techniques and cross-validation with other identification methods a further analysis of the identified essential proteins is presented to illustrate the accuracy of these measures. We argue for using techniques based upon graph structure for separating and encapsulating proteins based upon functionality.

RESULTS

We demonstrate how rational Erdos numbers may be used as a method to identify collaborating proteins based solely upon network structure. Further, by using dynamic cut-off limit it demonstrates how collaboration subgraphs can be generated for each protein within the network, and how graph containment can be used as a means of identifying which of many possible graphs are likely to be actual protein complexes. The demonstration protein interaction network built for diabetes is found to be a scale-free, small-world graph with a power-law degree distribution of interactions on nodes. These findings are consistent with many other protein interaction networks.

摘要

目的

高通量实验技术的最新进展使得许多蛋白质-蛋白质相互作用得以识别并存储在大型数据库中。理解蛋白质相互作用是科学和医学知识进步的基础，不幸的是，其规模需要一种自动化的分析方法。我们描述了我们的图挖掘技术，以识别蛋白质-蛋白质相互作用网络中的重要结构，以帮助人类理解和计算机分析。

方法和材料

我们描述了用于表征从人类蛋白质参考数据库整理的数据构建的图类型和相关属性的技术。使用随机图重连比较技术和与其他识别方法的交叉验证，对已识别的必需蛋白质进行了进一步分析，以说明这些措施的准确性。我们主张使用基于图结构的技术，根据功能分离和封装蛋白质。

结果

我们展示了如何将合理的厄多斯数用作仅基于网络结构识别协作蛋白质的方法。此外，通过使用动态截止限，它展示了如何为网络中的每个蛋白质生成协作子图，以及如何将图包含用作识别许多可能的图中哪些可能是实际蛋白质复合物的一种手段。为糖尿病构建的示范蛋白质相互作用网络被发现是一个无标度、小世界图，节点上的相互作用具有幂律度分布。这些发现与许多其他蛋白质相互作用网络一致。