Institute for Advanced Study, Shenzhen University, Shenzhen, China.
Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia.
BMC Bioinformatics. 2022 Oct 25;23(1):444. doi: 10.1186/s12859-022-04986-3.
Anti-CRISPR proteins are potent modulators that inhibit the CRISPR-Cas immunity system and have huge potential in gene editing and gene therapy as a genome-editing tool. Extensive studies have shown that anti-CRISPR proteins are essential for modifying endogenous genes, promoting the RNA-guided binding and cleavage of DNA or RNA substrates. In recent years, identifying and characterizing anti-CRISPR proteins has become a hot and significant research topic in bioinformatics. However, as most anti-CRISPR proteins fall short in sharing similarities to those currently known, traditional screening methods are time-consuming and inefficient. Machine learning methods could fill this gap with powerful predictive capability and provide a new perspective for anti-CRISPR protein identification.
Here, we present a novel machine learning ensemble predictor, called PreAcrs, to identify anti-CRISPR proteins from protein sequences directly. Three features and eight different machine learning algorithms were used to train PreAcrs. PreAcrs outperformed other existing methods and significantly improved the prediction accuracy for identifying anti-CRISPR proteins.
In summary, the PreAcrs predictor achieved a competitive performance for predicting new anti-CRISPR proteins in terms of accuracy and robustness. We anticipate PreAcrs will be a valuable tool for researchers to speed up the research process. The source code is available at: https://github.com/Lyn-666/anti_CRISPR.git .
抗 CRISPR 蛋白是强效调节剂,可抑制 CRISPR-Cas 免疫系统,作为基因组编辑工具,在基因编辑和基因治疗方面具有巨大潜力。大量研究表明,抗 CRISPR 蛋白对于修饰内源性基因至关重要,可促进 RNA 引导的 DNA 或 RNA 底物的结合和切割。近年来,鉴定和表征抗 CRISPR 蛋白已成为生物信息学中一个热门且重要的研究课题。然而,由于大多数抗 CRISPR 蛋白与目前已知的蛋白缺乏相似性,传统的筛选方法既耗时又低效。机器学习方法具有强大的预测能力,可以填补这一空白,并为抗 CRISPR 蛋白的鉴定提供新视角。
本研究提出了一种新的机器学习集成预测器 PreAcrs,可直接从蛋白质序列中识别抗 CRISPR 蛋白。PreAcrs 使用三种特征和八种不同的机器学习算法进行训练。PreAcrs 的表现优于其他现有方法,显著提高了识别抗 CRISPR 蛋白的预测准确性。
综上所述,PreAcrs 预测器在准确性和稳健性方面表现出了对预测新的抗 CRISPR 蛋白的竞争性能。我们预计 PreAcrs 将成为研究人员加速研究进程的有价值工具。源代码可在 https://github.com/Lyn-666/anti_CRISPR.git 获得。