Timonen Juho, Mannerström Henrik, Vehtari Aki, Lähdesmäki Harri
Department of Computer Science, Aalto University, Espoo 00076, Finland.
Bioinformatics. 2021 Jul 27;37(13):1860-1867. doi: 10.1093/bioinformatics/btab021.
Longitudinal study designs are indispensable for studying disease progression. Inferring covariate effects from longitudinal data, however, requires interpretable methods that can model complicated covariance structures and detect non-linear effects of both categorical and continuous covariates, as well as their interactions. Detecting disease effects is hindered by the fact that they often occur rapidly near the disease initiation time, and this time point cannot be exactly observed. An additional challenge is that the effect magnitude can be heterogeneous over the subjects.
We present lgpr, a widely applicable and interpretable method for non-parametric analysis of longitudinal data using additive Gaussian processes. We demonstrate that it outperforms previous approaches in identifying the relevant categorical and continuous covariates in various settings. Furthermore, it implements important novel features, including the ability to account for the heterogeneity of covariate effects, their temporal uncertainty, and appropriate observation models for different types of biomedical data. The lgpr tool is implemented as a comprehensive and user-friendly R-package.
lgpr is available at jtimonen.github.io/lgpr-usage with documentation, tutorials, test data and code for reproducing the experiments of this article.
Supplementary data are available at Bioinformatics online.
纵向研究设计对于研究疾病进展必不可少。然而,从纵向数据推断协变量效应需要可解释的方法,这些方法能够对复杂的协方差结构进行建模,并检测分类和连续协变量的非线性效应及其相互作用。疾病效应的检测受到以下事实的阻碍:它们通常在疾病起始时间附近迅速出现,而这个时间点无法精确观测到。另一个挑战是效应大小在不同个体之间可能存在异质性。
我们提出了lgpr,这是一种广泛适用且可解释的方法,用于使用加性高斯过程对纵向数据进行非参数分析。我们证明,在各种情况下,它在识别相关的分类和连续协变量方面优于先前的方法。此外,它还实现了重要的新特性,包括能够考虑协变量效应的异质性、其时间不确定性以及针对不同类型生物医学数据的适当观测模型。lgpr工具被实现为一个全面且用户友好的R包。
lgpr可在jtimonen.github.io/lgpr-usage上获取,其中包含文档、教程、测试数据以及用于重现本文实验的代码。
补充数据可在《生物信息学》在线获取。