Helsinki Institute for Information Technology, Department of Information and Computer Science, Aalto University, Espoo, Finland.
PLoS Comput Biol. 2013 Apr;9(4):e1003018. doi: 10.1371/journal.pcbi.1003018. Epub 2013 Apr 18.
Biomarker discovery aims to find small subsets of relevant variables in 'omics data that correlate with the clinical syndromes of interest. Despite the fact that clinical phenotypes are usually characterized by a complex set of clinical parameters, current computational approaches assume univariate targets, e.g. diagnostic classes, against which associations are sought for. We propose an approach based on asymmetrical sparse canonical correlation analysis (SCCA) that finds multivariate correlations between the 'omics measurements and the complex clinical phenotypes. We correlated plasma proteomics data to multivariate overlapping complex clinical phenotypes from tuberculosis and malaria datasets. We discovered relevant 'omic biomarkers that have a high correlation to profiles of clinical measurements and are remarkably sparse, containing 1.5-3% of all 'omic variables. We show that using clinical view projections we obtain remarkable improvements in diagnostic class prediction, up to 11% in tuberculosis and up to 5% in malaria. Our approach finds proteomic-biomarkers that correlate with complex combinations of clinical-biomarkers. Using the clinical-biomarkers improves the accuracy of diagnostic class prediction while not requiring the measurement plasma proteomic profiles of each subject. Our approach makes it feasible to use omics' data to build accurate diagnostic algorithms that can be deployed to community health centres lacking the expensive 'omics measurement capabilities.
生物标志物的发现旨在从‘组学’数据中找到与相关临床综合征相关的小部分相关变量。尽管临床表型通常由一组复杂的临床参数来描述,但目前的计算方法假设单变量目标,例如诊断类别,针对这些目标寻找关联。我们提出了一种基于非对称稀疏典型相关分析(SCCA)的方法,该方法可以在‘组学’测量值和复杂临床表型之间找到多变量相关性。我们将血浆蛋白质组学数据与结核病和疟疾数据集的多变量重叠复杂临床表型进行了相关性分析。我们发现了与临床测量值高度相关且显著稀疏的相关‘组学’生物标志物,其中包含 1.5-3%的所有‘组学’变量。我们表明,通过使用临床视图投影,我们可以显著提高诊断类别的预测准确性,在结核病中高达 11%,在疟疾中高达 5%。我们的方法找到了与临床生物标志物的复杂组合相关的蛋白质组学生物标志物。使用临床生物标志物可以提高诊断类别预测的准确性,而无需测量每个受试者的血浆蛋白质组学图谱。我们的方法使得使用‘组学’数据构建准确的诊断算法成为可能,这些算法可以部署到缺乏昂贵‘组学’测量能力的社区卫生中心。