Ecosystèmes Lagunaires, UMR 5119, CNRS, IFREMER, IRD, Université Montpellier 2, CC93, Place Eugène Bataillon, 34095 Montpellier Cedex 5, France.
Ecol Appl. 2011 Jun;21(4):1352-64. doi: 10.1890/09-1887.1.
Reliable assessment of fish origin is of critical importance for exploited species, since nursery areas must be identified and protected to maintain recruitment to the adult stock. During the last two decades, otolith chemical signatures (or "fingerprints") have been increasingly used as tools to discriminate between coastal habitats. However, correct assessment of fish origin from otolith fingerprints depends on various environmental and methodological parameters, including the choice of the statistical method used to assign fish to unknown origin. Among the available methods of classification, Linear Discriminant Analysis (LDA) is the most frequently used, although it assumes data are multivariate normal with homogeneous within-group dispersions, conditions that are not always met by otolith chemical data, even after transformation. Other less constrained classification methods are available, but there is a current lack of comparative analysis in applications to otolith microchemistry. Here, we assessed stock identification accuracy for four classification methods (LDA, Quadratic Discriminant Analysis [QDA], Random Forests [RF], and Artificial Neural Networks [ANN]), through the use of three distinct data sets. In each case, all possible combinations of chemical elements were examined to identify the elements to be used for optimal accuracy in fish assignment to their actual origin. Our study shows that accuracy differs according to the model and the number of elements considered. Best combinations did not include all the elements measured, and it was not possible to define an ad hoc multielement combination for accurate site discrimination. Among all the models tested, RF and ANN performed best, especially for complex data sets (e.g., with numerous fish species and/or chemical elements involved). However, for these data, RF was less time-consuming and more interpretable than ANN, and far more efficient and less demanding in terms of assumptions than LDA or QDA. Therefore, when LDA and QDA assumptions cannot be reached, the use of machine learning methods, such as RF, should be preferred for stock assessment and nursery identification based on otolith microchemistry, especially when data set include multispecific otolith signatures and/or many chemical elements.
可靠的鱼类起源评估对于捕捞物种至关重要,因为必须确定和保护育苗区,以维持对成年种群的补充。在过去的二十年中,耳石化学特征(或“指纹”)已越来越多地被用作区分沿海生境的工具。然而,正确评估耳石指纹中的鱼类起源取决于各种环境和方法参数,包括用于将鱼类分配到未知来源的统计方法的选择。在可用的分类方法中,线性判别分析(LDA)是最常用的方法,尽管它假设数据是多元正态的,并且组内分散是同质的,但是即使在经过转换之后,耳石化学数据也并非总是满足这些条件。还有其他限制较少的分类方法,但目前在应用于耳石微化学时缺乏比较分析。在这里,我们通过使用三个不同的数据集,评估了四种分类方法(LDA、二次判别分析[QDA]、随机森林[RF]和人工神经网络[ANN])的种群识别准确性。在每种情况下,都检查了化学元素的所有可能组合,以确定用于将鱼类最佳分配到实际起源的元素,从而获得最佳准确性。我们的研究表明,准确性因模型和考虑的元素数量而异。最佳组合不包括所有测量的元素,并且不可能为准确的站点区分定义特定的多元素组合。在所测试的所有模型中,RF 和 ANN 的性能最佳,尤其是对于复杂数据集(例如,涉及许多鱼类物种和/或化学元素)。但是,对于这些数据,RF 比 ANN 更耗时、更具可解释性,并且在假设方面比 LDA 或 QDA 更高效、更不苛刻。因此,当无法满足 LDA 和 QDA 的假设时,应优先使用机器学习方法(例如 RF)进行基于耳石微化学的种群评估和育苗识别,特别是当数据集包含多物种耳石特征和/或许多化学元素时。