Suppr超能文献

通过提升算法和马尔可夫序列分析优化 CRISPR/Cas9 切割效率回归。

CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling.

机构信息

Faculty of Engineering and Information Technology, Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW, Australia.

Faculty of Engineering and Information Technologies, School of Information Technologies, University of Sydney, Darlington, NSW, Australia.

出版信息

Bioinformatics. 2018 Sep 15;34(18):3069-3077. doi: 10.1093/bioinformatics/bty298.

Abstract

MOTIVATION

CRISPR/Cas9 system is a widely used genome editing tool. A prediction problem of great interests for this system is: how to select optimal single-guide RNAs (sgRNAs), such that its cleavage efficiency is high meanwhile the off-target effect is low.

RESULTS

This work proposed a two-step averaging method (TSAM) for the regression of cleavage efficiencies of a set of sgRNAs by averaging the predicted efficiency scores of a boosting algorithm and those by a support vector machine (SVM). We also proposed to use profiled Markov properties as novel features to capture the global characteristics of sgRNAs. These new features are combined with the outstanding features ranked by the boosting algorithm for the training of the SVM regressor. TSAM improved the mean Spearman correlation coefficiencies comparing with the state-of-the-art performance on benchmark datasets containing thousands of human, mouse and zebrafish sgRNAs. Our method can be also converted to make binary distinctions between efficient and inefficient sgRNAs with superior performance to the existing methods. The analysis reveals that highly efficient sgRNAs have lower melting temperature at the middle of the spacer, cut at 5'-end closer parts of the genome and contain more 'A' but less 'G' comparing with inefficient ones. Comprehensive further analysis also demonstrates that our tool can predict an sgRNA's cutting efficiency with consistently good performance no matter it is expressed from an U6 promoter in cells or from a T7 promoter in vitro.

AVAILABILITY AND IMPLEMENTATION

Online tool is available at http://www.aai-bioinfo.com/CRISPR/. Python and Matlab source codes are freely available at https://github.com/penn-hui/TSAM.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

CRISPR/Cas9 系统是一种广泛应用的基因组编辑工具。对于该系统,一个非常感兴趣的预测问题是:如何选择最佳的单指导 RNA(sgRNA),使其切割效率高,同时脱靶效应低。

结果

这项工作提出了一种两步平均方法(TSAM),通过对提升算法和支持向量机(SVM)的预测效率得分进行平均,来回归一组 sgRNA 的切割效率。我们还提出使用已记录的马尔可夫特性作为新的特征,以捕获 sgRNA 的全局特征。这些新特征与提升算法排序的优秀特征相结合,用于训练 SVM 回归器。与包含数千个人类、小鼠和斑马鱼 sgRNA 的基准数据集上的最新技术性能相比,TSAM 提高了平均 Spearman 相关系数。我们的方法也可以转换为在高效和低效 sgRNA 之间进行二进制区分,性能优于现有方法。分析表明,高效 sgRNA 在间隔区中间的熔点较低,在基因组的 5'端更近的部位切割,并且与低效 sgRNA 相比,含有更多的“A”但更少的“G”。综合进一步分析还表明,无论 sgRNA 是在细胞中由 U6 启动子表达还是在体外由 T7 启动子表达,我们的工具都可以预测其切割效率,具有一致的良好性能。

可用性和实现

在线工具可在 http://www.aai-bioinfo.com/CRISPR/ 上获得。Python 和 Matlab 源代码可在 https://github.com/penn-hui/TSAM 上免费获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验