University of Texas, M.D. Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA; UTHealth School of Biomedical Informatics, 7000 Fannin St., Houston, TX 77030, USA; Rice University, 6100 Main St., Houston, TX 77005, USA.
University of Texas, M.D. Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA.
J Biomed Inform. 2011 Dec;44 Suppl 1(Suppl 1):S69-S77. doi: 10.1016/j.jbi.2011.09.005. Epub 2011 Oct 1.
Proposal and execution of clinical trials, computation of quality measures and discovery of correlation between medical phenomena are all applications where an accurate count of patients is needed. However, existing sources of this type of patient information, including Clinical Data Warehouses (CDWs) may be incomplete or inaccurate. This research explores applying probabilistic techniques, supported by the MayBMS probabilistic database, to obtain accurate patient counts from a Clinical Data Warehouse containing synthetic patient data. We present a synthetic Clinical Data Warehouse, and populate it with simulated data using a custom patient data generation engine. We then implement, evaluate and compare different techniques for obtaining patients counts. We model billing as a test for the presence of a condition. We compute billing's sensitivity and specificity both by conducting a "Simulated Expert Review" where a representative sample of records are reviewed and labeled by experts, and by obtaining the ground truth for every record. We compute the posterior probability of a patient having a condition through a "Bayesian Chain", using Bayes' Theorem to calculate the probability of a patient having a condition after each visit. The second method is a "one-shot" approach that computes the probability of a patient having a condition based on whether the patient is ever billed for the condition. Our results demonstrate the utility of probabilistic approaches, which improve on the accuracy of raw counts. In particular, the simulated review paired with a single application of Bayes' Theorem produces the best results, with an average error rate of 2.1% compared to 43.7% for the straightforward billing counts. Overall, this research demonstrates that Bayesian probabilistic approaches improve patient counts on simulated patient populations. We believe that total patient counts based on billing data are one of the many possible applications of our Bayesian framework. Use of these probabilistic techniques will enable more accurate patient counts and better results for applications requiring this metric.
临床试验的提出和执行、质量措施的计算以及医学现象之间的相关性的发现都是需要准确计数患者的应用。然而,现有的这种类型的患者信息来源,包括临床数据仓库 (CDW),可能是不完整或不准确的。本研究探索应用概率技术,由 MayBMS 概率数据库支持,从包含合成患者数据的临床数据仓库中获得准确的患者计数。我们提出了一个合成的临床数据仓库,并使用定制的患者数据生成引擎用模拟数据填充它。然后,我们实现、评估和比较了从临床数据仓库中获取患者计数的不同技术。我们将计费建模为存在某种情况的测试。我们通过进行“模拟专家审查”来计算计费的灵敏度和特异性,在该审查中,代表记录的样本由专家进行审查和标记,并为每个记录获取真实值。我们通过“贝叶斯链”计算患者患有某种情况的后验概率,使用贝叶斯定理计算每次就诊后患者患有某种情况的概率。第二种方法是一种“单次”方法,它根据患者是否因该情况而被计费来计算患者患有该情况的概率。我们的结果表明,概率方法的实用性,这些方法提高了原始计数的准确性。特别是,模拟审查与贝叶斯定理的单次应用相结合,产生了最好的结果,平均错误率为 2.1%,而直接计费计数的错误率为 43.7%。总体而言,这项研究表明,贝叶斯概率方法可以提高模拟患者群体的患者计数。我们相信,基于计费数据的总患者计数是我们贝叶斯框架的许多可能应用之一。使用这些概率技术将能够更准确地计数患者,并为需要该指标的应用程序提供更好的结果。