Department of Epidemiology, Harvard T.H. School of Public Health, Boston, Massachusetts, USA.
Department of Statistics and Data Science, Hebrew University, Jerusalem, Israel.
Stat Med. 2024 Dec 20;43(29):5473-5483. doi: 10.1002/sim.10257. Epub 2024 Oct 25.
Paired organs like eyes, ears, and lungs in humans exhibit similarities, and data from these organs often display remarkable correlations. Accounting for these correlations could enhance classification models used in predicting disease phenotypes. To our knowledge, there is limited, if any, literature addressing this topic, and existing methods do not exploit such correlations. For example, the conventional approach treats each ear as an independent observation when predicting audiometric phenotypes and is agnostic about the correlation of data from the two ears of the same person. This approach may lead to information loss and reduce the model performance. In response to this gap, particularly in the context of audiometric phenotype prediction, this paper proposes new quadratic discriminant analysis (QDA) algorithms that appropriately deal with the dependence between ears. We propose two-stage analysis strategies: (1) conducting data transformations to reduce data dimensionality before applying QDA; and (2) developing new QDA algorithms to partially utilize the dependence between phenotypes of two ears. We conducted simulation studies to compare different transformation methods and to assess the performance of different QDA algorithms. The empirical results suggested that the transformation may only be beneficial when the sample size is relatively small. Moreover, our proposed new QDA algorithms performed better than the conventional approach in both person-level and ear-level accuracy. As an illustration, we applied them to audiometric data from the Medical University of South Carolina Longitudinal Cohort Study of Age-related Hearing Loss. In addition, we developed an R package, PairQDA, to implement the proposed algorithms.
成对的器官,如人类的眼睛、耳朵和肺部,具有相似性,并且这些器官的数据通常显示出显著的相关性。考虑到这些相关性,可以增强用于预测疾病表型的分类模型。据我们所知,目前针对这个主题的文献很少,如果有的话,并且现有的方法并没有利用这种相关性。例如,在预测听力表型时,传统方法将每个耳朵视为独立的观察值,并且对同一人的两个耳朵的数据相关性一无所知。这种方法可能会导致信息丢失并降低模型性能。针对这种差距,特别是在听力表型预测的背景下,本文提出了新的二次判别分析(QDA)算法,这些算法可以适当处理耳朵之间的相关性。我们提出了两阶段分析策略:(1)在应用 QDA 之前进行数据转换以降低数据维度;(2)开发新的 QDA 算法以部分利用两个耳朵的表型之间的相关性。我们进行了模拟研究来比较不同的转换方法,并评估不同 QDA 算法的性能。实证结果表明,当样本量相对较小时,转换可能仅有益。此外,我们提出的新 QDA 算法在个体水平和耳朵水平的准确性方面均优于传统方法。作为说明,我们将其应用于南卡罗来纳医科大学年龄相关性听力损失纵向队列研究的听力数据。此外,我们开发了一个 R 包 PairQDA 来实现所提出的算法。