Petersen Anne Helby, Markussen Bo, Christensen Karl Bang
Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark.
J Appl Stat. 2020 May 27;48(9):1675-1695. doi: 10.1080/02664763.2020.1773772. eCollection 2021.
Datasets are sometimes divided into distinct subsets, e.g. due to multi-center sampling, or to variations in instruments, questionnaire item ordering or mode of administration, and the data analyst then needs to assess whether a joint analysis is meaningful. The Principal Component Analysis-based Data Structure Comparisons (PCADSC) tools are three new non-parametric, visual diagnostic tools for investigating differences in structure for two subsets of a dataset through covariance matrix comparisons by use of principal component analysis. The PCADCS tools are demonstrated in a data example using European Social Survey data on psychological well-being in three countries, Denmark, Sweden, and Bulgaria. The data structures are found to be different in Denmark and Bulgaria, and thus a comparison of for example mean psychological well-being scores is not meaningful. However, when comparing Denmark and Sweden, very similar data structures, and thus comparable concepts of well-being, are found. Therefore, inter-country comparisons are warranted for these countries.
数据集有时会被划分为不同的子集,例如由于多中心抽样,或者由于仪器、问卷项目顺序或施测方式的差异,然后数据分析师需要评估联合分析是否有意义。基于主成分分析的数据结构比较(PCADSC)工具是三种新的非参数可视化诊断工具,用于通过使用主成分分析进行协方差矩阵比较来研究数据集中两个子集的结构差异。PCADCS工具在一个数据示例中得到了展示,该示例使用了欧洲社会调查中关于丹麦、瑞典和保加利亚三个国家心理健康的数据。研究发现丹麦和保加利亚的数据结构不同,因此例如比较平均心理健康得分是没有意义的。然而,在比较丹麦和瑞典时,发现它们的数据结构非常相似,因此幸福概念具有可比性。所以,对这些国家进行国家间比较是有必要的。