Gioutlakis Aris, Klapa Maria I, Moschonas Nicholas K
Department of General Biology, School of Medicine, University of Patras, Patras, Greece.
Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research and Technology-Hellas (FORTH/ICE-HT), Patras, Greece.
PLoS One. 2017 Oct 12;12(10):e0186039. doi: 10.1371/journal.pone.0186039. eCollection 2017.
It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes.
人们已经认识到,记录经实验支持的人类蛋白质-蛋白质相互作用(PPI)的源数据库重叠有限。因此,构建一个全面的PPI网络需要对多个异构的原始数据集进行适当整合,这些数据集在不同的遗传参考水平上呈现PPI。现有的PPI元数据库通过归一化进行整合;也就是说,PPI在转换到某个目标水平后进行合并。因此,每次整合网络的节点集都取决于合并数据集的数量和类型。此外,不可逆的先验归一化过程阻碍了在整合网络中识别归一化伪像,这些伪像源于表征遗传信息流的非线性。PICKLE(蛋白质相互作用知识库)2.0为这个最近引入的人类PPI元数据库实现了一种新架构。它相对于现有元数据库的主要新颖之处在于其通过遗传信息本体对原始PPI数据集进行整合的方法。基于PICKLE使用UniProtKB/Swiss-Prot的经过审核的人类完整蛋白质组(RHCP)作为参考蛋白质相互作用体集的原则,并根据现有证据滤除直接相互作用可能性低的蛋白质相互作用,PICKLE 2.0首先通过连接相应的基因、核苷酸序列(mRNA)和蛋白质(UniProt条目)来组装RHCP遗传信息本体网络,然后通过将PPI数据集叠加到本体网络上而不进行任何先验转换来进行整合。重要的是,这个过程允许将得到的异构整合网络可逆地归一化到任何遗传参考水平而不丢失原始信息,后者用于识别归一化偏差,并通过PPI源数据库交叉检查来评估潜在的假阳性相互作用。基于网络的PICKLE界面(www.pickle.gr)允许同时查询多个实体,并在三种PPI过滤模式下提供蛋白质(UniProt)或基因水平的整合人类PPI网络。