McHugh E S, Shinn A P, Kay J W
Department of Statistics, University of Glasgow, Scotland, UK.
Parasitology. 2000 Sep;121 ( Pt 3):315-23. doi: 10.1017/s0031182099006381.
The identification and discrimination of 2 closely related and morphologically similar species of Gyrodactylus, G. salaris and G. thymalli, were assessed using the statistical classification methodologies Linear Discriminant Analysis (LDA) and k-Nearest Neighbours (KNN). These statistical methods were applied to morphometric measurements made on the gyrodactylid attachment hooks. The mean estimated classification percentages of correctly identifying each species were 98.1% (LDA) and 97.9% (KNN) for G. salaris and 99.9% (LDA) and 73.2% (KNN) for G. thymalli. The analysis was expanded to include another 2 closely related species and the new classification efficiencies were 94.6% (LDA) and 98.% (KNN) for G. salaris; 98.2% (LDA) and 72.6% (KNN) for G. thymalli; 86.7% (LDA) and 91.8% (KNN) for G. derjavini; and 76.5% (LDA) and 77.7% (KNN) for G. truttae. The higher correct classification scores of G. salaris and G. thymalli by the LDA classifier in the 2-species analysis over the 4-species analysis suggested the development of a 2-stage classifier. The mean estimated correct classification scores were 99.97% (LDA) and 99.99% (KNN) for the G. salaris-G. thymalli pairing and 99.4% (LDA) and 99.92% (KNN) for the G. derjavini-G. truttae pairing. Assessment of the 2-stage classifier using only marginal hook data was very good with classification efficiencies of 100% (LDA) and 99.6% (KNN) for the G. salaris G. thymalli pairing and 97.2% (LDA) and 99.2% (KNN) for the G. derjavini-G. truttae pairing. Paired species were then discriminated individually in the second stage of the classifier using data from the full set of hooks. These analyses demonstrate that using the methods of LDA and KNN statistical classification, the discrimination of closely related and pathogenic species of Gyrodactylus may be achieved using data derived from light microscope studies.
使用线性判别分析(LDA)和k近邻算法(KNN)这两种统计分类方法,对两种亲缘关系密切且形态相似的三代虫属物种——鲑三代虫(Gyrodactylus salaris)和河鲈三代虫(G. thymalli)进行了识别和区分。这些统计方法被应用于三代虫附着钩的形态测量数据。对于鲑三代虫,LDA和KNN正确识别每个物种的平均估计分类百分比分别为98.1%和97.9%;对于河鲈三代虫,LDA和KNN的相应百分比分别为99.9%和73.2%。分析范围扩大到另外两种亲缘关系密切的物种,新的分类效率如下:对于鲑三代虫,LDA为94.6%,KNN为98%;对于河鲈三代虫,LDA为98.2%,KNN为72.6%;对于德氏三代虫(G. derjavini),LDA为86.7%,KNN为91.8%;对于鳟三代虫(G. truttae),LDA为76.5%,KNN为77.7%。在两物种分析中,LDA分类器对鲑三代虫和河鲈三代虫的正确分类得分高于四物种分析,这表明可开发一种两阶段分类器。对于鲑三代虫 - 河鲈三代虫配对,LDA和KNN的平均估计正确分类得分分别为99.97%和99.99%;对于德氏三代虫 - 鳟三代虫配对,LDA和KNN的相应得分分别为99.4%和99.92%。仅使用边缘钩数据对两阶段分类器进行评估的结果非常好,对于鲑三代虫 - 河鲈三代虫配对,LDA和KNN的分类效率分别为100%和99.6%;对于德氏三代虫 - 鳟三代虫配对,LDA和KNN的分类效率分别为97.2%和99.2%。然后在分类器的第二阶段,使用来自全套钩的数据分别区分配对的物种。这些分析表明,使用LDA和KNN统计分类方法,利用光学显微镜研究获得的数据,可以实现对三代虫属亲缘关系密切的致病物种的区分。