Gaskins Jeremy T, Fuentes Claudio, De La Cruz Rolando
Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
Department of Statistics, Oregon State University, Corvallis, OR 97331, USA.
Biostatistics. 2022 Dec 12;24(1):209-225. doi: 10.1093/biostatistics/kxab026.
Across several medical fields, developing an approach for disease classification is an important challenge. The usual procedure is to fit a model for the longitudinal response in the healthy population, a different model for the longitudinal response in the diseased population, and then apply Bayes' theorem to obtain disease probabilities given the responses. Unfortunately, when substantial heterogeneity exists within each population, this type of Bayes classification may perform poorly. In this article, we develop a new approach by fitting a Bayesian nonparametric model for the joint outcome of disease status and longitudinal response, and then we perform classification through the clustering induced by the Dirichlet process. This approach is highly flexible and allows for multiple subpopulations of healthy, diseased, and possibly mixed membership. In addition, we introduce an Markov chain Monte Carlo sampling scheme that facilitates the assessment of the inference and prediction capabilities of our model. Finally, we demonstrate the method by predicting pregnancy outcomes using longitudinal profiles on the human chorionic gonadotropin beta subunit hormone levels in a sample of Chilean women being treated with assisted reproductive therapy.
在多个医学领域中,开发一种疾病分类方法是一项重大挑战。通常的程序是为健康人群的纵向反应拟合一个模型,为患病人群的纵向反应拟合一个不同的模型,然后应用贝叶斯定理根据反应获得疾病概率。不幸的是,当每个群体内部存在大量异质性时,这种贝叶斯分类可能表现不佳。在本文中,我们通过为疾病状态和纵向反应的联合结果拟合一个贝叶斯非参数模型来开发一种新方法,然后我们通过狄利克雷过程诱导的聚类进行分类。这种方法具有高度的灵活性,允许存在健康、患病以及可能具有混合成员身份的多个亚群体。此外,我们引入了一种马尔可夫链蒙特卡罗抽样方案,该方案有助于评估我们模型的推理和预测能力。最后,我们通过使用接受辅助生殖治疗的智利女性样本中人类绒毛膜促性腺激素β亚基激素水平的纵向概况来预测妊娠结局,从而演示该方法。