Yang Lianping, Zhang Weilin
1 College of Sciences, Northeastern University , Shenyang, China .
2 Department of Mathematics, New York University Shanghai , Shanghai, China .
J Comput Biol. 2017 Apr;24(4):299-310. doi: 10.1089/cmb.2016.0030. Epub 2016 Dec 19.
How we can describe the similarity relationship between the biological sequences is a basic but important problem in bioinformatics. The first graphical representation method for the similarity relationship rather than for single sequence is proposed in this article, which makes the similarity intuitional. Some properties such as sensitivity and continuity of the similarity are proved theoretically, which indicate that the similarity describer has the advantage of both alignment and alignment-free methods. With the aid of multiresolution analysis tools, we can exhibit the similarity's different profiles, from high resolution to low resolution. Then the idea of multiresolution clustering is raised first. A reassortment analysis on a benchmark flu virus genome data set is to test our method and it shows a better performance than alignment method, especially in dealing with problems involving segments' order.
如何描述生物序列之间的相似性关系是生物信息学中一个基本但重要的问题。本文提出了第一种用于相似性关系而非单个序列的图形表示方法,这使得相似性变得直观。从理论上证明了相似性的一些性质,如敏感性和连续性,这表明相似性描述符兼具比对方法和无比对方法的优点。借助多分辨率分析工具,我们可以展示相似性从高分辨率到低分辨率的不同特征。然后首次提出了多分辨率聚类的概念。对一个基准流感病毒基因组数据集进行重排分析以测试我们的方法,结果表明它比比对方法表现更好,特别是在处理涉及片段顺序的问题时。