Chen Minhua, Zaas Aimee, Woods Christopher, Ginsburg Geoffrey S, Lucas Joseph, Dunson David, Carin Lawrence
Minhua Chen is Ph.D. Student, Electrical and Computer Engineering Department, Aimee Zaas is Associate Professor, Christopher Woods is Associate Professor, Geoffrey S. Ginsburg is Professor and Director of Genomic Medicine, and Joseph Lucas is Assistant Research Professor, Institute for Genome Sciences and Policy & Department of Medicine, David Dunson is Professor, Department of Statistical Science, and Lawrence Carin is Professor and Department Chair (
J Am Stat Assoc. 2011 Jan 1;106(496):1259-1279. doi: 10.1198/jasa.2011.ap10611.
There is often interest in predicting an individual's latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed with healthy human volunteers, we develop a novel time-aligned Bayesian dynamic factor analysis methodology. The time course trajectories in the gene expressions are related to a relatively low-dimensional vector of latent factors, which vary dynamically starting at the latent initiation time of infection. Using a nonparametric cure rate model for the latent initiation times, we allow selection of the genes in the viral response pathway, variability among individuals in infection times, and a subset of individuals who are not infected. As we demonstrate using held-out data, this statistical framework allows accurate predictions of infected individuals in advance of the development of clinical symptoms, without labeled data and even when the number of biomarkers vastly exceeds the number of individuals under study. Biological interpretation of several of the inferred pathways (factors) is provided.
基于随时间变化的高维生物标志物来预测个体潜在健康状况,这一需求一直存在。受我们在两项针对健康人类志愿者进行的流感挑战研究中收集的时间进程基因表达阵列数据的启发,我们开发了一种新颖的时间对齐贝叶斯动态因子分析方法。基因表达中的时间进程轨迹与一个相对低维的潜在因子向量相关,这些潜在因子从感染的潜在起始时间开始动态变化。通过对潜在起始时间使用非参数治愈率模型,我们能够筛选出病毒反应途径中的基因,考虑个体感染时间的变异性,以及未感染个体的一个子集。正如我们使用留出数据所证明的那样,这个统计框架能够在临床症状出现之前准确预测感染个体,无需标记数据,甚至当生物标志物的数量远远超过所研究个体的数量时也是如此。我们还对几个推断出的途径(因子)进行了生物学解释。