Saryan Preeti, Gupta Shubham, Gowda Vinita
Department of Biological Sciences Indian Institute of Science Education and Research Bhopal Bhopal Bypass Road Bhopal Madhya Pradesh 462066 India.
Department of Computer Science and Automation Indian Institute of Science Bengaluru Karnataka 560012 India.
Appl Plant Sci. 2020 Jul 31;8(7):e11377. doi: 10.1002/aps3.11377. eCollection 2020 Jul.
Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non-metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster-defining characters.
We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user-specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information-based character selection and a -test were used to identify cluster-defining characters.
Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters.
Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation.
大多数形态学家用于验证物种界限的统计方法(如主成分分析[PCA]和非度量多维标度法[nMDS])存在局限性,因为这些方法大多用作可视化方法,而且由于分类群是由分类学家确定的(即有监督的),会增加人为偏差。在此,我们使用光谱聚类算法来无监督地发现物种界限,随后对定义聚类的特征进行分析。
我们对该属内的16个形态特征使用光谱聚类、nMDS和PCA,对来自10个分类群的93个个体进行分组。使用径向基函数核进行光谱聚类,并使用用户指定的调谐值(γ)。使用特征间隙、归一化互信息得分和兰德指数对使用每个γ值发现的聚类的优劣进行量化。最后,基于互信息的特征选择和t检验用于识别定义聚类的特征。
光谱聚类在此处研究的物种复合体中揭示了5个、9个和12个分类群聚类。特征选择确定了至少4个定义这些聚类的特征。
结合我们提出的特征分析方法,光谱聚类能够无监督地发现物种界限,并解释其生物学意义。我们的结果表明,光谱聚类与特征选择分析相结合可以增强形态测量分析,并且在物种界定方面优于当前的聚类方法。