UMR INSERM unité U722 and Université Denis Diderot-Paris 7, Faculté de médecine, site Xavier Bichat, 16 rue Henri Huchard, 75870 Paris cedex 18, France.
Evol Bioinform Online. 2011;7:61-85. doi: 10.4137/EBO.S7048. Epub 2011 Jun 7.
Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon.TO STUDY HOW THESE FEATURES MIGHT INFLUENCE PHYLOGENY RECONSTRUCTIONS, WE EXAMINED A PARTICULAR POPULAR METHOD: the Fitch-Margoliash algorithm, which belongs to the Least Squares methods. We show that the Least Squares methods are closely related to Multi Dimensional Scaling. Indeed, criteria for Fitch-Margoliash and Sammon's mapping are somewhat similar. However, the prolific research in Multi Dimensional Scaling has definitely allowed outclassing Sammon's mapping.Least Square methods for tree reconstruction can now take advantage of these improvements. However, "false neighborhood" and "tears" are the two main risks in dimensionality reduction field: "false neighborhood" corresponds to a widely separated data in the original space that are found close in representation space, and neighbor data that are displayed in remote positions constitute a "tear". To address this problem, we took advantage of the concepts of "continuity" and "trustworthiness" in the tree reconstruction field, which limit the risk of "false neighborhood" and "tears". We also point out the concentration of measured phenomenon as a source of error and introduce here new criteria to build phylogenies with improved preservation of distances and robustness.The authors and the Evolutionary Bioinformatics Journal dedicate this article to the memory of Professor W.M. Fitch (1929-2011).
无论采用哪种系统发育方法,遗传序列通常都被描述为字符序列,因此分子序列可以被视为多维空间中的元素。因此,研究该空间中的运动(即进化过程)必须处理高维空间的惊人特征,例如测量现象的集中。为了研究这些特征如何影响系统发育重建,我们检查了一种特别流行的方法:Fitch-Margoliash 算法,它属于最小二乘法。我们表明,最小二乘法与多维尺度分析密切相关。实际上,Fitch-Margoliash 和 Sammon 映射的标准有些相似。然而,多维尺度分析的大量研究肯定使 Sammon 映射相形见绌。树重建的最小二乘法现在可以利用这些改进。然而,降维领域存在两个主要风险:“伪邻域”和“撕裂”。“伪邻域”对应于原始空间中广泛分离的数据,而在表示空间中发现它们是接近的;而显示在远程位置的邻居数据则构成了“撕裂”。为了解决这个问题,我们利用了树重建领域中的“连续性”和“可信度”概念,这些概念限制了“伪邻域”和“撕裂”的风险。我们还指出测量现象的集中是误差的一个来源,并在此引入了新的标准,以构建距离保持和鲁棒性得到改进的系统发育。作者和进化生物信息学杂志谨以此文纪念教授 W.M. Fitch(1929-2011)。