School of Information Science and Engineering, Central South University, Changsha, China.
Proteomics. 2013 Jan;13(2):291-300. doi: 10.1002/pmic.201200436. Epub 2013 Jan 3.
Clustering of protein-protein interaction networks is one of the most prevalent methods for identifying protein complexes, detecting functional modules and predicting protein functions. In the past few years, many clustering methods have been proposed. However, it is still a challenging task to evaluate how well the protein clusters are identified. Even for two of the most popular measurements, F-measure and p-value, bias exists when evaluating the identified clusters. In this paper, we propose two new types of measurements to evaluate clusters more finely and distinctly. One is hF-measure(Tf) , a topology-free measurement and another is hF-measure(Tb) , a topology-based measurement. Unlike F-measure, the new measurements of hF-measure(Tf) and hF-measure(Tb) can discriminate between different types of errors. Both artificial test data and practical test data were used to evaluate the effectiveness of hF-measure(Tf) and hF-measure(Tb) . For the artificial test data, artificial errors were generated by replacing some cluster members with functionally similar or non-similar members. The practical test data was produced by seven clustering algorithms Markov Clustering, Molecular Complex Detection, HC-PIN, SPICI, CPM, Core-Attachment and RRW. The experimental results on artificial and practical test data both show that hF-measure(Tf) and hF-measure(Tb) evaluate clusters more accurately compared to F-measure. Especially, hF-measure(Tb) can capture the topology changes in clusters, which can also be used to the analysis of dynamic network.
蛋白质-蛋白质相互作用网络的聚类是识别蛋白质复合物、检测功能模块和预测蛋白质功能的最常用方法之一。在过去的几年中,已经提出了许多聚类方法。然而,评估蛋白质簇的识别效果仍然是一项具有挑战性的任务。即使对于两个最受欢迎的度量标准,F 度量和 p 值,在评估识别出的簇时也存在偏差。在本文中,我们提出了两种新的度量标准,可以更精细和更明显地评估簇。一种是 hF 度量(Tf),一种无拓扑度量,另一种是 hF 度量(Tb),一种基于拓扑的度量。与 F 度量不同,hF 度量(Tf)和 hF 度量(Tb)的新度量标准可以区分不同类型的错误。人工测试数据和实际测试数据都用于评估 hF 度量(Tf)和 hF 度量(Tb)的有效性。对于人工测试数据,通过用功能相似或不相似的成员替换一些簇成员来生成人工错误。实际测试数据由七种聚类算法 Markov Clustering、Molecular Complex Detection、HC-PIN、SPICI、CPM、Core-Attachment 和 RRW 生成。在人工和实际测试数据上的实验结果都表明,与 F 度量相比,hF 度量(Tf)和 hF 度量(Tb)可以更准确地评估簇。特别是,hF 度量(Tb)可以捕获簇中的拓扑变化,也可以用于动态网络的分析。