Science, Technology and Research Institute of Delaware, Wilmington, DE, United States of America.
PLoS One. 2019 Oct 25;14(10):e0224460. doi: 10.1371/journal.pone.0224460. eCollection 2019.
We show that a two-component proportional representation provides the necessary framework to account for the properties of a 2 × 2 contingency table. This corresponds to the factorization of the table as a product of proportion and diagonal row or column sum matrices. The row and column sum invariant measures for proportional variation are obtained. Geometrically, these correspond to displacements of two point vectors in the standard one-simplex, which are reduced to a center-of-mass coordinate representation, [Formula: see text]. Then, effect size measures, such as the odds ratio and relative risk, correspond to different perspective functions for the mapping of (δ, μ) to [Formula: see text]. Furthermore, variations in δ and μ will be associated with different cost-benefit trade-offs for a given application. Therefore, pure mathematics alone does not provide the specification of a general form for the perspective function. This implies that the question of the merits of the odds ratio versus relative risk cannot be resolved in a general way. Expressions are obtained for the marginal sum dependence and the relations between various effect size measures, including the simple matching coefficient, odds ratio, relative risk, Yule's Q, ϕ, and Goodman and Kruskal's τc|r. We also show that Gini information gain (IGG) is equivalent to ϕ2 in the classification and regression tree (CART) algorithm. Then, IGG can yield misleading results due to the dependence on marginal sums. Monte Carlo methods facilitate the detailed specification of stochastic effects in the data acquisition process and provide a practical way to estimate the confidence interval for an effect size.
我们表明,两分量比例代表制为解释 2×2 列联表的性质提供了必要的框架。这对应于将表分解为比例和对角行或列和矩阵的乘积。获得了比例变化的行和列和不变量测度。从几何上看,这些对应于标准一单形中两个点向量的位移,这些位移简化为质心坐标表示,[公式:见正文]。然后,效果大小度量,如优势比和相对风险,对应于(δ,μ)到[公式:见正文]映射的不同透视函数。此外,对于给定的应用,δ和μ的变化将与不同的成本效益权衡相关。因此,纯粹的数学本身并不能为透视函数的一般形式提供规范。这意味着,优势比与相对风险的优点不能以一般方式解决。得到了边际和依赖关系以及各种效果大小度量之间的关系,包括简单匹配系数、优势比、相对风险、Yule 的 Q、ϕ 和 Goodman 和 Kruskal 的 τc|r。我们还表明,基尼信息增益(IGG)在分类和回归树(CART)算法中等同于ϕ2。然后,由于对边际和的依赖,IGG 可能会产生误导性结果。蒙特卡罗方法有助于在数据采集过程中详细说明随机效应,并为效果大小的置信区间提供实用的估计方法。