Yu Jessica S, Pertusi Dante A, Adeniran Adebola V, Tyo Keith E J
Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA.
Bioinformatics. 2017 Mar 15;33(6):909-916. doi: 10.1093/bioinformatics/btw710.
High throughput screening by fluorescence activated cell sorting (FACS) is a common task in protein engineering and directed evolution. It can also be a rate-limiting step if high false positive or negative rates necessitate multiple rounds of enrichment. Current FACS software requires the user to define sorting gates by intuition and is practically limited to two dimensions. In cases when multiple rounds of enrichment are required, the software cannot forecast the enrichment effort required.
We have developed CellSort, a support vector machine (SVM) algorithm that identifies optimal sorting gates based on machine learning using positive and negative control populations. CellSort can take advantage of more than two dimensions to enhance the ability to distinguish between populations. We also present a Bayesian approach to predict the number of sorting rounds required to enrich a population from a given library size. This Bayesian approach allowed us to determine strategies for biasing the sorting gates in order to reduce the required number of enrichment rounds. This algorithm should be generally useful for improve sorting outcomes and reducing effort when using FACS.
Source code available at http://tyolab.northwestern.edu/tools/ . k-tyo@northwestern.edu.
Supplementary data are available at Bioinformatics online.
通过荧光激活细胞分选(FACS)进行高通量筛选是蛋白质工程和定向进化中的常见任务。如果高假阳性或假阴性率需要多轮富集,它也可能成为限速步骤。当前的FACS软件要求用户凭直觉定义分选门,并且实际上仅限于二维。在需要多轮富集的情况下,该软件无法预测所需的富集工作量。
我们开发了CellSort,这是一种支持向量机(SVM)算法,可使用阳性和阴性对照群体基于机器学习识别最佳分选门。CellSort可以利用两个以上的维度来增强区分群体的能力。我们还提出了一种贝叶斯方法,用于根据给定的文库大小预测富集一个群体所需的分选轮数。这种贝叶斯方法使我们能够确定使分选门产生偏差的策略,以减少所需的富集轮数。该算法通常应有助于改善分选结果并减少使用FACS时的工作量。
源代码可在http://tyolab.northwestern.edu/tools/获取。k-tyo@northwestern.edu。
补充数据可在《生物信息学》在线获取。