Akdemir Deniz, Somo Mohamed, Isidro-Sanchéz Julio
Center of International Bone Marrow Transplantation Research, Minneapolis, MN 55401-1206, USA.
Syngenta Seeds, Junction City, KS 66441, USA.
Axioms. 2023 Feb;12(2). doi: 10.3390/axioms12020161. Epub 2023 Feb 4.
The generation of unprecedented amounts of data brings new challenges in data management, but also an opportunity to accelerate the identification of processes of multiple science disciplines. One of these challenges is the harmonization of high-dimensional unbalanced and heterogeneous data. In this manuscript, we propose a statistical approach to combine incomplete and partially-overlapping pieces of covariance matrices that come from independent experiments. We assume that the data are a random sample of partial covariance matrices sampled from Wishart distributions and we derive an expectation-maximization algorithm for parameter estimation. We demonstrate the properties of our method by (i) using simulation studies and (ii) using empirical datasets. In general, being able to make inferences about the covariance of variables not observed in the same experiment is a valuable tool for data analysis since covariance estimation is an important step in many statistical applications, such as multivariate analysis, principal component analysis, factor analysis, and structural equation modeling.
前所未有的大量数据的产生给数据管理带来了新挑战,但也为加速多学科过程的识别提供了机遇。其中一个挑战是高维不平衡且异构数据的协调。在本手稿中,我们提出一种统计方法,用于合并来自独立实验的协方差矩阵的不完整且部分重叠的片段。我们假设数据是从威沙特分布中采样的部分协方差矩阵的随机样本,并推导了一种用于参数估计的期望最大化算法。我们通过(i)使用模拟研究和(ii)使用实证数据集来证明我们方法的性质。一般来说,能够对在同一实验中未观察到的变量的协方差进行推断,对于数据分析而言是一种有价值的工具,因为协方差估计是许多统计应用(如多变量分析、主成分分析、因子分析和结构方程建模)中的重要步骤。