Sferra Gabriella, Fratini Federica, Ponzi Marta, Pizzi Elisabetta
Dipartimento di Malattie Infettive, Parassitarie e Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161, Rome, Italy.
BMC Bioinformatics. 2017 Sep 5;18(1):396. doi: 10.1186/s12859-017-1815-5.
Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods.
Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity.
In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.
从基因组序列预测功能和/或物理蛋白质-蛋白质相互作用的强大方法的开发是后基因组时代的主要任务之一。系统发育谱分析允许在原核生物和真核生物的全基因组水平上预测蛋白质-蛋白质相互作用。因此,它被认为是最有前途的方法之一。
在此,我们提出了一种系统发育谱分析的改进方法,该方法能够处理大型基因组数据集并推断全局蛋白质-蛋白质相互作用。该方法使用距离相关性作为系统发育谱相似性的新度量。我们构建了稳健的参考集,并开发了Phylo-dCor,这是一种用于计算距离相关性的算法的并行版本,使其适用于大型基因组数据。使用酿酒酵母和大肠杆菌基因组数据集,我们表明Phylo-dCor优于先前基于互信息和皮尔逊相关性作为谱相似性度量所描述的系统发育谱分析方法。
在这项工作中,我们构建并评估了稳健的参考集,并提出距离相关性作为比较系统发育谱的一种度量。为了使其适用于大型基因组数据,我们开发了Phylo-dCor,这是一种用于计算距离相关性的算法的并行版本。如有需要,可提供两个可在多种机器上运行的R脚本。