Yu Feng, Yang Zhi, Hu Xiao, Sun Yuan, Lin Hong, Wang Jian
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S3. doi: 10.1186/1471-2105-16-S12-S3. Epub 2015 Aug 25.
Revealing protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. However, the small amount of known physical interactions may limit protein complex detection.
The new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC), which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks.
The experimental results of SLPC on two different categories yeast PPI networks demonstrate effectiveness of the approach: compared with the original PPI networks, the best average improvements of 4.76, 6.81 and 15.75 percentage units in the F-score, accuracy and maximum matching ratio (MMR) are achieved respectively; compared with the denoising PPI networks, the best average improvements of 3.91, 4.61 and 12.10 percentage units in the F-score, accuracy and MMR are achieved respectively; compared with ClusterONE, the start-of the-art complex detection method, on the denoising extended PPI networks, the average improvements of 26.02 and 22.40 percentage units in the F-score and MMR are achieved respectively.
The experimental results show that the performances of SLPC have a large improvement through integration of new receivable PPI data from biomedical literature into original PPI networks and denoising PPI networks. In addition, our protein complexes detection method can achieve better performance than ClusterONE.
揭示蛋白质复合物对于理解细胞组织和功能原理至关重要。高通量实验技术产生了大量的蛋白质相互作用,这使得从蛋白质-蛋白质相互作用(PPI)网络预测蛋白质复合物成为可能。然而,已知的物理相互作用数量较少可能会限制蛋白质复合物的检测。
通过将PPI数据集与来自生物医学文献的大量且易于获取的PPI数据整合来构建新的PPI网络,然后基于两种蛋白质的语义相似性和拓扑相似性过滤掉两者之间不太可靠的PPI。最后,将能够充分利用可用已知复合物信息的监督学习蛋白质复合物检测(SLPC)应用于新的PPI网络上检测蛋白质复合物。
SLPC在两类不同的酵母PPI网络上的实验结果证明了该方法的有效性:与原始PPI网络相比,在F值、准确率和最大匹配率(MMR)方面分别实现了4.76、6.81和15.75个百分点的最佳平均提升;与去噪PPI网络相比,在F值、准确率和MMR方面分别实现了3.91、4.61和12.10个百分点的最佳平均提升;与最先进的复合物检测方法ClusterONE相比,在去噪扩展PPI网络上,在F值和MMR方面分别实现了26.02和22.40个百分点的平均提升。
实验结果表明,通过将来自生物医学文献的新的可接收PPI数据整合到原始PPI网络和去噪PPI网络中,SLPC的性能有了很大提升。此外,我们的蛋白质复合物检测方法比ClusterONE能取得更好的性能。