Department of Biology, Carleton University, Ottawa, Canada.
PLoS One. 2010 Jun 1;5(6):e10912. doi: 10.1371/journal.pone.0010912.
How can we compute a segregation or diversity index from a three-way or multi-way contingency table, where each variable can take on an arbitrary finite number of values and where the index takes values between zero and one? Previous methods only exist for two-way contingency tables or dichotomous variables. A prototypical three-way case is the segregation index of a set of industries or departments given multiple explanatory variables of both sex and race. This can be further extended to other variables, such as disability, number of years of education, and former military service.
METHODOLOGY/PRINCIPAL FINDINGS: We extend existing segregation indices based on Euclidean distance (square of coefficient of variation) and Boltzmann/Shannon/Theil index from two-way to multi-way contingency tables by including multiple summations. We provide several biological applications, such as indices for age polyethism and linkage disequilibrium. We also provide a new heuristic conceptualization of entropy-based indices. Higher order association measures are often independent of lower order ones, hence an overall segregation or diversity index should be the arithmetic mean of the normalized association measures at all orders. These methods are applicable when individuals self-identify as multiple races or even multiple sexes and when individuals work part-time in multiple industries.
CONCLUSIONS/SIGNIFICANCE: The policy implications of this work are enormous, allowing people to rigorously test whether employment or biological diversity has changed.
如何从三向或多向列联表中计算隔离或多样性指数,其中每个变量可以取任意有限个值,而指数取值在 0 到 1 之间?以前的方法仅适用于双向列联表或二项变量。一个典型的三向案例是一组行业或部门的隔离指数,这些行业或部门有性别和种族等多个解释变量。这可以进一步扩展到其他变量,如残疾、受教育年限和前兵役。
方法/主要发现:我们通过包括多个求和项,将基于欧几里得距离(变异系数的平方)和 Boltzmann/Shannon/Theil 指数的现有隔离指数从双向扩展到多向列联表。我们提供了几个生物学应用,例如年龄多态性和连锁不平衡的指数。我们还提供了基于熵的指数的新启发式概念化。高阶关联度量通常与低阶关联度量无关,因此总体隔离或多样性指数应该是所有阶数归一化关联度量的算术平均值。当个体自我认同为多种种族甚至多种性别,或者当个体在多个行业兼职时,这些方法是适用的。
结论/意义:这项工作的政策意义巨大,使人们能够严格测试就业或生物多样性是否发生了变化。