Listgarten Jennifer, Weinstein Michael, Kleinstiver Benjamin P, Sousa Alexander A, Joung J Keith, Crawford Jake, Gao Kevin, Hoang Luong, Elibol Melih, Doench John G, Fusi Nicolo
Microsoft Research, Cambridge, MA, USA.
Molecular, Cell, and Developmental Biology, and Quantitative and Computational Biosciences Institute, University of California Los Angeles, Los Angeles, CA, USA.
Nat Biomed Eng. 2018 Jan;2(1):38-47. doi: 10.1038/s41551-017-0178-6. Epub 2018 Jan 10.
The CRISPR-Cas9 system provides unprecedented genome editing capabilities. However, off-target effects lead to sub-optimal usage and additionally are a bottleneck in the development of therapeutic uses. Herein, we introduce the first machine learning-based approach to off-target prediction, yielding a state-of-the-art model for CRISPR-Cas9 that outperforms all other guide design services. Our approach, Elevation, consists of two interdependent machine learning models-one for scoring individual guide-target pairs, and another which aggregates these guide-target scores into a single, overall summary guide score. Through systematic investigation, we demonstrate that Elevation performs substantially better than competing approaches on both tasks. Additionally, we are the first to systematically evaluate approaches on the guide summary score problem; we show that the most widely-used method performs no better than random at times, whereas Elevation consistently outperformed it, sometimes by an order of magnitude. We also introduce an evaluation method that balances errors between active and inactive guides, thereby encapsulating a range of practical use cases; Elevation is consistently superior to other methods across the entire range. Finally, because of the large scale and computational demands of off-target prediction, we have developed a cloud-based service for quick retrieval. This service provides end-to-end guide design by also incorporating our previously reported on-target model, Azimuth. (https://crispr.ml:please treat this web site as confidential until publication).
CRISPR-Cas9系统提供了前所未有的基因组编辑能力。然而,脱靶效应导致其使用效果欠佳,此外也是治疗用途开发中的一个瓶颈。在此,我们引入了第一种基于机器学习的脱靶预测方法,得到了一个用于CRISPR-Cas9的先进模型,其性能优于所有其他引导设计服务。我们的方法Elevation由两个相互依赖的机器学习模型组成——一个用于对单个引导序列-靶点对进行评分,另一个将这些引导序列-靶点评分汇总为一个单一的总体引导序列总结评分。通过系统研究,我们证明Elevation在这两项任务上的表现都明显优于其他竞争方法。此外,我们是第一个系统评估引导序列总结评分问题方法的;我们表明,最广泛使用的方法有时表现并不比随机选择好,而Elevation始终优于它,有时领先一个数量级。我们还引入了一种评估方法,该方法平衡了活性和非活性引导序列之间的误差,从而涵盖了一系列实际应用案例;在整个范围内,Elevation始终优于其他方法。最后,由于脱靶预测的规模大且计算要求高,我们开发了一种基于云的服务以实现快速检索。该服务还结合了我们之前报道的靶向模型Azimuth,提供端到端的引导序列设计。(https://crispr.ml:在发表之前,请将此网站视为机密)