Chen Feng, Zhang Yuhong, Chen Yi-Ping Phoebe
College of Information Science and Engineering, Henan University of Technology, Zhengzhou, China; Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia.
College of Information Science and Engineering, Henan University of Technology, Zhengzhou, China.
Comput Biol Med. 2014 May;48:109-18. doi: 10.1016/j.compbiomed.2014.02.004. Epub 2014 Feb 28.
In multiple genome fragments, a globally important mode is a zone represented by a significant change, where the change has a similar impact on every related fragment in the zone. This zone may represent the cancer related genes involved in diverse tumors. Globally important zones are characterized by two features: (1) there are more data points in globally important zones than in other areas of fragments; (2) the data points are distributed evenly on as many genome fragments as possible. Globally important zone mining needs to contain the following features: (1) independent of data distribution; (2) noise filtering; (3) pattern boundary identification; and (4) zone ranking. We have developed a hierarchical and density-based method, called GIZFinder (globally important zone finder), to detect and rank such zones based on two criteria: distribution width and distribution depth. The comparisons on the simulated data shows our method performs significantly better than the kernel framework and the sliding window. By experimenting on real cancer gene data, we identify 53 novel cancer genes, some of which have been proven correct.
在多个基因组片段中,一种具有全局重要性的模式是由显著变化所代表的区域,这种变化对该区域内的每个相关片段都有相似的影响。这个区域可能代表了涉及多种肿瘤的癌症相关基因。具有全局重要性的区域有两个特征:(1)全局重要区域中的数据点比片段的其他区域更多;(2)数据点尽可能均匀地分布在多个基因组片段上。全局重要区域挖掘需要具备以下特征:(1)独立于数据分布;(2)噪声过滤;(3)模式边界识别;以及(4)区域排序。我们开发了一种基于层次和密度的方法,称为GIZFinder(全局重要区域查找器),以基于两个标准——分布宽度和分布深度来检测和排列此类区域。在模拟数据上的比较表明,我们的方法比内核框架和滑动窗口的性能显著更好。通过对真实癌症基因数据进行实验,我们识别出53个新的癌症基因,其中一些已被证明是正确的。