Biostatistics Division, Department of Preventive Medicine, Northwestern University, Chicago, IL 60611;
Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208.
Proc Natl Acad Sci U S A. 2018 Sep 25;115(39):E9247-E9256. doi: 10.1073/pnas.1800314115. Epub 2018 Sep 10.
Circadian clocks play a key role in regulating a vast array of biological processes, with significant implications for human health. Accurate assessment of physiological time using transcriptional biomarkers found in human blood can significantly improve diagnosis of circadian disorders and optimize the delivery time of therapeutic treatments. To be useful, such a test must be accurate, minimally burdensome to the patient, and readily generalizable to new data. A major obstacle in development of gene expression biomarker tests is the diversity of measurement platforms and the inherent variability of the data, often resulting in predictors that perform well in the original datasets but cannot be universally applied to new samples collected in other settings. Here, we introduce TimeSignature, an algorithm that robustly infers circadian time from gene expression. We demonstrate its application in data from three independent studies using distinct microarrays and further validate it against a new set of samples profiled by RNA-sequencing. Our results show that TimeSignature is more accurate and efficient than competing methods, estimating circadian time to within 2 h for the majority of samples. Importantly, we demonstrate that once trained on data from a single study, the resulting predictor can be universally applied to yield highly accurate results in new data from other studies independent of differences in study population, patient protocol, or assay platform without renormalizing the data or retraining. This feature is unique among expression-based predictors and addresses a major challenge in the development of generalizable, clinically useful tests.
生物钟在调节广泛的生物过程中起着关键作用,对人类健康有着重大影响。使用在人体血液中发现的转录生物标志物准确评估生理时间,可以显著提高对生物钟紊乱的诊断,并优化治疗药物的给药时间。为了有用,这样的测试必须准确,对患者的负担最小,并且易于推广到新的数据。基因表达生物标志物测试开发的一个主要障碍是测量平台的多样性和数据的固有可变性,这通常导致在原始数据集中表现良好的预测因子,但不能普遍应用于在其他环境中收集的新样本。在这里,我们引入了 TimeSignature,这是一种从基因表达中稳健推断生物钟的算法。我们展示了它在使用不同微阵列的三个独立研究中的应用,并进一步针对由 RNA-seq 分析得到的新样本集进行了验证。我们的结果表明,TimeSignature 比竞争方法更准确和高效,对于大多数样本,它可以将生物钟时间估计在 2 小时内。重要的是,我们证明,一旦在单个研究的数据上进行训练,所得到的预测因子可以普遍应用于来自其他研究的新数据,而无需重新规范化数据或重新训练,而与研究人群、患者方案或检测平台的差异无关。这是基于表达的预测因子所独有的,解决了开发通用的、临床有用的测试的一个主要挑战。