Department of Biological Sciences, Marshall University, One John Marshall Drive, Huntington, WV 25701, USA.
Department of Anatomy, Des Moines University, 3200 Grand Avenue, Des Moines, IA 50312, USA.
Syst Biol. 2022 Jun 16;71(4):810-822. doi: 10.1093/sysbio/syab088.
This article investigates a form of rank deficiency in phenotypic covariance matrices derived from geometric morphometric data, and its impact on measures of phenotypic integration. We first define a type of rank deficiency based on information theory then demonstrate that this deficiency impairs the performance of phenotypic integration metrics in a model system. Lastly, we propose methods to treat for this information rank deficiency. Our first goal is to establish how the rank of a typical geometric morphometric covariance matrix relates to the information entropy of its eigenvalue spectrum. This requires clear definitions of matrix rank, of which we define three: the full matrix rank (equal to the number of input variables), the mathematical rank (the number of nonzero eigenvalues), and the information rank or "effective rank" (equal to the number of nonredundant eigenvalues). We demonstrate that effective rank deficiency arises from a combination of methodological factors-Generalized Procrustes analysis, use of the correlation matrix, and insufficient sample size-as well as phenotypic covariance. Secondly, we use dire wolf jaws to document how differences in effective rank deficiency bias two metrics used to measure phenotypic integration. The eigenvalue variance characterizes the integration change incorrectly, and the standardized generalized variance lacks the sensitivity needed to detect subtle changes in integration. Both metrics are impacted by the inclusion of many small, but nonzero, eigenvalues arising from a lack of information in the covariance matrix, a problem that usually becomes more pronounced as the number of landmarks increases. We propose a new metric for phenotypic integration that combines the standardized generalized variance with information entropy. This metric is equivalent to the standardized generalized variance but calculated only from those eigenvalues that carry nonredundant information. It is the standardized generalized variance scaled to the effective rank of the eigenvalue spectrum. We demonstrate that this metric successfully detects the shift of integration in our dire wolf sample. Our third goal is to generalize the new metric to compare data sets with different sample sizes and numbers of variables. We develop a standardization for matrix information based on data permutation then demonstrate that Smilodon jaws are more integrated than dire wolf jaws. Finally, we describe how our information entropy-based measure allows phenotypic integration to be compared in dense semilandmark data sets without bias, allowing characterization of the information content of any given shape, a quantity we term "latent dispersion". [Canis dirus; Dire wolf; effective dispersion; effective rank; geometric morphometrics; information entropy; latent dispersion; modularity and integration; phenotypic integration; relative dispersion.].
本文研究了源于几何形态测量数据的表型协方差矩阵中的一种秩缺陷及其对表型整合度量的影响。我们首先基于信息理论定义了一种类型的秩缺陷,然后证明这种缺陷会损害模型系统中表型整合度量的性能。最后,我们提出了处理这种信息秩缺陷的方法。我们的第一个目标是确定典型几何形态测量协方差矩阵的秩与特征值谱信息熵之间的关系。这需要对矩阵秩进行明确的定义,我们定义了三种:全矩阵秩(等于输入变量的数量)、数学秩(非零特征值的数量)和信息秩或“有效秩”(等于非冗余特征值的数量)。我们证明了有效秩缺陷是由方法因素(广义 Procrustes 分析、使用相关矩阵和样本量不足)以及表型协方差共同引起的。其次,我们使用狼獾颌骨来记录有效秩缺陷如何影响两种用于测量表型整合的度量。特征值方差错误地描述了整合变化,标准化广义方差缺乏检测整合细微变化所需的敏感性。这两种度量都受到协方差矩阵中信息不足导致的许多小但非零特征值的影响,随着地标数量的增加,这个问题通常会变得更加明显。我们提出了一种新的表型整合度量,它将标准化广义方差与信息熵结合在一起。该度量与标准化广义方差等效,但仅从具有非冗余信息的特征值中计算得出。它是标准化广义方差按特征值谱的有效秩缩放的结果。我们证明了该度量成功地检测到了我们的狼獾样本中整合的变化。我们的第三个目标是将新度量推广到比较具有不同样本量和变量数量的数据集中。我们基于数据置换开发了一种矩阵信息标准化方法,然后证明剑齿虎的颌骨比狼獾的更具整合性。最后,我们描述了如何使用基于信息熵的度量方法来比较密集的半地标数据集,而不会产生偏差,从而可以对任何给定形状的信息量进行特征描述,我们将其称为“潜在离散度”。[Canis dirus;Dire wolf;有效离散度;有效秩;几何形态测量学;信息熵;潜在离散度;模块性和整合性;表型整合;相对离散度]。