Duke Human Vaccine Institute, Duke University, Durham, NC, 27710, USA.
Department of Statistical Science, Duke University, Durham, NC, 27708, USA.
Sci Rep. 2018 Jun 22;8(1):9551. doi: 10.1038/s41598-018-27531-w.
High-throughput screening of compounds (chemicals) is an essential part of drug discovery, involving thousands to millions of compounds, with the purpose of identifying candidate hits. Most statistical tools, including the industry standard B-score method, work on individual compound plates and do not exploit cross-plate correlation or statistical strength among plates. We present a new statistical framework for high-throughput screening of compounds based on Bayesian nonparametric modeling. The proposed approach is able to identify candidate hits from multiple plates simultaneously, sharing statistical strength among plates and providing more robust estimates of compound activity. It can flexibly accommodate arbitrary distributions of compound activities and is applicable to any plate geometry. The algorithm provides a principled statistical approach for hit identification and false discovery rate control. Experiments demonstrate significant improvements in hit identification sensitivity and specificity over the B-score and R-score methods, which are highly sensitive to threshold choice. These improvements are maintained at low hit rates. The framework is implemented as an efficient R extension package BHTSpack and is suitable for large scale data sets.
高通量筛选化合物(化学物质)是药物发现的重要组成部分,涉及数千到数百万种化合物,目的是确定候选命中。大多数统计工具,包括行业标准的 B 评分方法,都在单个化合物板上工作,而不利用板间的相关性或统计强度。我们提出了一种基于贝叶斯非参数建模的新型高通量化合物筛选统计框架。所提出的方法能够同时从多个板中识别候选命中,共享板间的统计强度,并提供更稳健的化合物活性估计。它可以灵活地适应化合物活性的任意分布,适用于任何板几何形状。该算法为命中识别和假发现率控制提供了一种有原则的统计方法。实验表明,与对阈值选择高度敏感的 B 评分和 R 评分方法相比,该方法在命中识别的灵敏度和特异性方面有显著提高。这些改进在低命中率下得以维持。该框架实现为一个高效的 R 扩展包 BHTSpack,适用于大规模数据集。