UMR INSERM unité U722 and Université Denis Diderot, Paris 7, Faculté de médecine, site Xavier Bichat, 16 rue Henri Huchard, 75870 Paris cedex 18, France.
Evol Bioinform Online. 2011;7:257-70. doi: 10.4137/EBO.S7565. Epub 2011 Nov 13.
Color may be very useful to visualise complex data. As far as taxonomy is concerned, color may help observing various species' characteristics in correlation with classification. However, choosing the number of subclasses to display is often a complex task: on the one hand, assigning a limited number of colors to taxa of interest hides the structure imbedded in the subtrees of the taxonomy; on the other hand, differentiating a high number of taxa by giving them specific colors, without considering the underlying taxonomy, may lead to unreadable results since relationships between displayed taxa would not be supported by the color code. In the present paper, an automatic color coding scheme is proposed to visualise the levels of taxonomic relationships displayed as overlay on any kind of data plot. To achieve this goal, a dimensionality reduction method allows displaying taxonomic "distances" onto a Euclidean two-dimensional space. The resulting map is projected onto a 2D color space (the Hue, Saturation, Brightness colorimetric space with brightness set to 1). Proximity in the taxonomic classification corresponds to proximity on the map and is therefore materialised by color proximity. As a result, each species is related to a color code showing its position in the taxonomic tree. The so called ColorPhylo displays taxonomic relationships intuitively and can be combined with any biological result. A Matlab version of ColorPhylo is available at http://sy.lespi.free.fr/ColorPhylo-homepage.html. Meanwhile, an ad-hoc distance in case of taxonomy with unknown edge lengths is proposed.
颜色对于可视化复杂数据可能非常有用。就分类学而言,颜色可以帮助观察各种物种的特征与分类的相关性。然而,选择要显示的子类数量通常是一项复杂的任务:一方面,将有限数量的颜色分配给感兴趣的分类单元会隐藏分类树中嵌入的结构;另一方面,通过给大量分类单元赋予特定的颜色而不考虑底层分类学,可能会导致不可读的结果,因为显示的分类单元之间的关系不会得到颜色代码的支持。在本文中,提出了一种自动颜色编码方案,以可视化作为任何类型数据图的叠加层显示的分类关系的层次结构。为了实现这一目标,一种降维方法允许将分类“距离”显示在欧几里得二维空间上。生成的地图被投影到 2D 颜色空间(Hue、Saturation、Brightness 色度空间,亮度设置为 1)。分类学分类中的接近程度对应于地图上的接近程度,因此通过颜色接近程度来体现。结果,每个物种都与一个颜色代码相关联,该代码显示其在分类树中的位置。所谓的 ColorPhylo 直观地显示分类关系,并可与任何生物学结果结合使用。ColorPhylo 的 Matlab 版本可在 http://sy.lespi.free.fr/ColorPhylo-homepage.html 上获得。同时,还提出了一种用于未知边缘长度分类学的特定距离。