Lemmon Gordon, Wesolowski Sergiusz, Henrie Alex, Tristani-Firouzi Martin, Yandell Mark
Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
Utah Center for Genetic Discovery and Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
Nat Comput Sci. 2021 Oct;1(10):694-702. doi: 10.1038/s43588-021-00141-9. Epub 2021 Oct 21.
Discovering the concomitant occurrence of distinct medical conditions in a patient, also known as comorbidities, is a prerequisite for creating patient outcome prediction tools. Current comorbidity discovery applications are designed for small datasets and use stratification to control for confounding variables such as age, sex or ancestry. Stratification lowers false positive rates, but reduces power, as the size of the study cohort is decreased. Here we describe a Poisson binomial-based approach to comorbidity discovery (PBC) designed for big-data applications that circumvents the need for stratification. PBC adjusts for confounding demographic variables on a per-patient basis and models temporal relationships. We benchmark PBC using two datasets to compute comorbidity statistics on 4,623,841 pairs of potentially comorbid medical terms. The results of this computation are provided as a searchable web resource. Compared with current methods, the PBC approach reduces false positive associations while retaining statistical power to discover true comorbidities.
发现患者同时存在不同的医疗状况(也称为共病)是创建患者预后预测工具的前提条件。当前的共病发现应用程序是为小数据集设计的,并使用分层方法来控制诸如年龄、性别或血统等混杂变量。分层降低了假阳性率,但由于研究队列规模减小,会降低检验效能。在此,我们描述了一种基于泊松二项式的共病发现方法(PBC),该方法专为大数据应用而设计,无需分层。PBC在逐个患者的基础上对混杂的人口统计学变量进行调整,并对时间关系进行建模。我们使用两个数据集对PBC进行基准测试,以计算4623841对潜在共病医学术语的共病统计数据。此计算结果作为一个可搜索的网络资源提供。与当前方法相比,PBC方法减少了假阳性关联,同时保留了发现真正共病的统计效能。