IEEE Trans Med Imaging. 2018 Aug;37(8):1761-1774. doi: 10.1109/TMI.2018.2815583. Epub 2018 Mar 13.
Reducing the number of false discoveries is presently one of the most pressing issues in the life sciences. It is of especially great importance for many applications in neuroimaging and genomics, where data sets are typically high-dimensional, which means that the number of explanatory variables exceeds the sample size. The false discovery rate (FDR) is a criterion that can be employed to address that issue. Thus it has gained great popularity as a tool for testing multiple hypotheses. Canonical correlation analysis (CCA) is a statistical technique that is used to make sense of the cross-correlation of two sets of measurements collected on the same set of samples (e.g., brain imaging and genomic data for the same mental illness patients), and sparse CCA extends the classical method to high-dimensional settings. Here, we propose a way of applying the FDR concept to sparse CCA, and a method to control the FDR. The proposed FDR correction directly influences the sparsity of the solution, adapting it to the unknown true sparsity level. Theoretical derivation as well as simulation studies show that our procedure indeed keeps the FDR of the canonical vectors below a user-specified target level. We apply the proposed method to an imaging genomics data set from the Philadelphia Neurodevelopmental Cohort. Our results link the brain connectivity profiles derived from brain activity during an emotion identification task, as measured by functional magnetic resonance imaging, to the corresponding subjects' genomic data.
减少假发现率是生命科学领域当前最紧迫的问题之一。对于神经影像学和基因组学中的许多应用来说,这一点尤为重要,因为这些数据集通常具有较高的维度,这意味着解释变量的数量超过了样本量。错误发现率 (FDR) 是可以用来解决这个问题的标准。因此,它作为一种用于测试多个假设的工具而广受欢迎。典型相关分析 (CCA) 是一种统计技术,用于理解在同一组样本上收集的两组测量值之间的交叉相关性(例如,同一精神疾病患者的脑成像和基因组数据),稀疏 CCA 将经典方法扩展到高维设置。在这里,我们提出了一种将 FDR 概念应用于稀疏 CCA 的方法,以及一种控制 FDR 的方法。所提出的 FDR 校正直接影响解的稀疏度,使其适应未知的真实稀疏度水平。理论推导和模拟研究表明,我们的程序确实可以将典型向量的 FDR 保持在用户指定的目标水平以下。我们将提出的方法应用于费城神经发育队列的一个成像基因组数据集。我们的结果将功能磁共振成像测量的情绪识别任务期间大脑活动得出的大脑连接图谱与相应受试者的基因组数据联系起来。