Xie Jiang, Wang Minchao, Dai Dongbo, Zhang Huiran, Zhang Wu
School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China.
Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:6329-32. doi: 10.1109/EMBC.2012.6347441.
Detection of protein families in large scale database is a difficult but important biological problem. Computational clustering methods can effectively address the problem. Although there exist many clustering algorithms, most of them are just based on the threshold. Their computational performances are affected by the weight distribution greatly, and they are only valid for some special networks. A new network clustering algorithm, Markov Finding and Clustering (MFC), is proposed to cluster the proteins into their functionally specific families accurately in this paper. The MFC algorithm makes an improvement in the random walk process and reduces the affection of the noise on the clustering result. It has a good performance on these networks which are not well addressed by existing algorithms sensitive to the noise. Finally, experiments on the protein sequence datasets demonstrate that the algorithm is effective in the detection of protein families and has a better performance than the current algorithms.
在大规模数据库中检测蛋白质家族是一个困难但重要的生物学问题。计算聚类方法可以有效地解决这个问题。尽管存在许多聚类算法,但它们大多仅基于阈值。它们的计算性能受权重分布的影响很大,并且仅对某些特殊网络有效。本文提出了一种新的网络聚类算法——马尔可夫查找与聚类(MFC),以将蛋白质准确地聚类到其功能特定的家族中。MFC算法在随机游走过程中进行了改进,减少了噪声对聚类结果的影响。对于现有对噪声敏感的算法处理效果不佳的网络,它具有良好的性能。最后,在蛋白质序列数据集上的实验表明,该算法在蛋白质家族检测中是有效的,并且比当前算法具有更好的性能。