Chen Ting-Li, Chou Elizabeth P, Fushing Hsieh
Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan.
Department of Statistics, National Chengchi University, Taipei 11605, Taiwan.
Entropy (Basel). 2021 Dec 15;23(12):1684. doi: 10.3390/e23121684.
Without assuming any functional or distributional structure, we select collections of major factors embedded within response-versus-covariate (Re-Co) dynamics via selection criteria [C1: confirmable] and [C2: irrepaceable], which are based on information theoretic measurements. The two criteria are constructed based on the computing paradigm called Categorical Exploratory Data Analysis (CEDA) and linked to Wiener-Granger causality. All the information theoretical measurements, including conditional mutual information and entropy, are evaluated through the contingency table platform, which primarily rests on the categorical nature within all involved features of any data types: quantitative or qualitative. Our selection task identifies one chief collection, together with several secondary collections of major factors of various orders underlying the targeted Re-Co dynamics. Each selected collection is checked with algorithmically computed reliability against the finite sample phenomenon, and so is each member's major factor individually. The developments of our selection protocol are illustrated in detail through two experimental examples: a simple one and a complex one. We then apply this protocol on two data sets pertaining to two somewhat related but distinct pitching dynamics of two pitch types: slider and fastball. In particular, we refer to a specific Major League Baseball (MLB) pitcher and we consider data of multiple seasons.
在不假设任何函数或分布结构的情况下,我们通过基于信息论度量的选择标准[C1:可确认的]和[C2:不可替代的],从响应与协变量(Re-Co)动态关系中嵌入的主要因素集合中进行选择。这两个标准基于称为分类探索性数据分析(CEDA)的计算范式构建,并与维纳-格兰杰因果关系相关联。所有信息论度量(包括条件互信息和熵)都通过列联表平台进行评估,该平台主要基于任何数据类型(定量或定性)的所有相关特征中的分类性质。我们的选择任务确定了一个主要集合,以及目标Re-Co动态关系背后不同阶次的几个主要因素的次要集合。针对有限样本现象,使用算法计算的可靠性对每个选定集合及其每个成员的主要因素进行检查。通过两个实验示例详细说明了我们选择协议的发展:一个简单示例和一个复杂示例。然后,我们将此协议应用于两个数据集,这两个数据集与两种投球类型(滑球和快球)的两个有些相关但不同的投球动态有关。特别是,我们参考了一位特定的美国职业棒球大联盟(MLB)投手,并考虑了多个赛季的数据。