Stojmirović Aleksandar, Yu Yi-Kuo
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
J Comput Biol. 2009 Apr;16(4):579-610. doi: 10.1089/cmb.2008.0100.
We introduce a geometric framework suitable for studying the relationships among biological sequences. In contrast to previous works, our formulation allows asymmetric distances (quasi-metrics), originating from uneven weighting of strings, which may induce non-trivial partial orders on sets of biosequences. The distances considered are more general than traditional generalized string edit distances. In particular, our framework enables non-trivial conversion between sequence similarities, both local and global, and distances. Our constructions apply to a wide class of scoring schemes and require much less restrictive gap penalties than the ones regularly used. Numerous examples are provided to illustrate the concepts introduced and their potential applications.
我们引入了一个适用于研究生物序列之间关系的几何框架。与之前的工作不同,我们的公式允许不对称距离(拟度量),其源于字符串的不均匀加权,这可能会在生物序列集上诱导出非平凡的偏序。所考虑的距离比传统的广义字符串编辑距离更具一般性。特别是,我们的框架能够在局部和全局的序列相似性与距离之间进行非平凡的转换。我们的构建适用于广泛的评分方案,并且所需的间隙罚分比常规使用的罚分限制少得多。提供了大量示例来说明所引入的概念及其潜在应用。