Li Quefeng, Li Lexin
Department of Biostatistics, University of North Carolina at Chapel Hill, 3105D McGavran-Greenberg Hall, Chapel Hill, North Carolina 27599, U.S.A.
Division of Biostatistics, University of California at Berkeley, 50 University Hall 7360, Berkeley, California 94720, U.S.A.
Biometrika. 2018 Dec;105(4):917-930. doi: 10.1093/biomet/asy047. Epub 2018 Oct 22.
Multiple types of data measured on a common set of subjects arise in many areas. Numerous empirical studies have found that integrative analysis of such data can result in better statistical performance in terms of prediction and feature selection. However, the advantages of integrative analysis have mostly been demonstrated empirically. In the context of two-class classification, we propose an integrative linear discriminant analysis method and establish a theoretical guarantee that it achieves a smaller classification error than running linear discriminant analysis on each data type individually. We address the issues of outliers and missing values, frequently encountered in integrative analysis, and illustrate our method through simulations and a neuroimaging study of Alzheimer's disease.
在许多领域中,会出现针对同一组受试者测量的多种类型的数据。大量实证研究发现,对这些数据进行综合分析在预测和特征选择方面能够带来更好的统计性能。然而,综合分析的优势大多是通过实证证明的。在二分类的背景下,我们提出了一种综合线性判别分析方法,并建立了理论保证,即与对每种数据类型单独进行线性判别分析相比,该方法能实现更小的分类误差。我们解决了综合分析中经常遇到的异常值和缺失值问题,并通过模拟和一项关于阿尔茨海默病的神经影像学研究对我们的方法进行了说明。