Simulation and Modelling Sciences, Pfizer Inc., Groton, Connecticut 06340, United States.
Discovery Sciences, Pfizer Inc., Groton, Connecticut 06340, United States.
J Chem Inf Model. 2022 May 9;62(9):2239-2247. doi: 10.1021/acs.jcim.1c00986. Epub 2021 Dec 4.
By analyzing data sets of replicate DNA-Encoded Library (DEL) selections, an approach for estimating the noise level of the experiment has been developed. Using a logarithm transformation of the number of counts associated with each compound and a subset of compounds with the highest number of counts, it is possible to assess the quality of the data through normalizing the replicates and use this same data to estimate the noise in the experiment. The noise level is seen to be dependent on sequencing depth as well as specific selection conditions. The noise estimation is independent of any cutoff used to remove low frequency compounds from the data analysis. The removal of compounds with only 1-5 read counts greatly reduces some of the challenges encountered in DEL data analysis as it can reduce the data set by greater than 100-fold without impacting the interpretation of the results.
通过分析重复 DNA 编码文库 (DEL) 选择的数据组,开发了一种估计实验噪声水平的方法。通过对与每个化合物相关的计数数量进行对数转换,并使用具有最高计数数量的化合物子集,可以通过对重复数据进行归一化来评估数据的质量,并使用相同的数据来估计实验中的噪声。噪声水平取决于测序深度以及特定的选择条件。噪声估计与用于从数据分析中去除低频化合物的任何截止值无关。从数据分析中去除仅具有 1-5 个读取计数的化合物可以极大地减少在 DEL 数据分析中遇到的一些挑战,因为它可以将数据集减少 100 倍以上,而不会影响结果的解释。