Ruffolo Jeffrey A, Nayfach Stephen, Gallagher Joseph, Bhatnagar Aadyot, Beazer Joel, Hussain Riffat, Russ Jordan, Yip Jennifer, Hill Emily, Pacesa Martin, Meeske Alexander J, Cameron Peter, Madani Ali
Profluent Bio, Berkeley, CA, USA.
Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland.
Nature. 2025 Jul 30. doi: 10.1038/s41586-025-09298-z.
Gene editing has the potential to solve fundamental challenges in agriculture, biotechnology and human health. CRISPR-based gene editors derived from microorganisms, although powerful, often show notable functional tradeoffs when ported into non-native environments, such as human cells. Artificial-intelligence-enabled design provides a powerful alternative with the potential to bypass evolutionary constraints and generate editors with optimal properties. Here, using large language models trained on biological diversity at scale, we demonstrate successful precision editing of the human genome with a programmable gene editor designed with artificial intelligence. To achieve this goal, we curated a dataset of more than 1 million CRISPR operons through systematic mining of 26 terabases of assembled genomes and metagenomes. We demonstrate the capacity of our models by generating 4.8× the number of protein clusters across CRISPR-Cas families found in nature and tailoring single-guide RNA sequences for Cas9-like effector proteins. Several of the generated gene editors show comparable or improved activity and specificity relative to SpCas9, the prototypical gene editing effector, while being 400 mutations away in sequence. Finally, we demonstrate that an artificial-intelligence-generated gene editor, denoted as OpenCRISPR-1, exhibits compatibility with base editing. We release OpenCRISPR-1 to facilitate broad, ethical use across research and commercial applications.
基因编辑有潜力解决农业、生物技术和人类健康领域的根本性挑战。源自微生物的基于CRISPR的基因编辑器虽然功能强大,但移植到非天然环境(如人类细胞)中时,往往会表现出显著的功能权衡。人工智能辅助设计提供了一种强大的替代方案,有可能绕过进化限制,生成具有最佳特性的编辑器。在此,我们使用基于大规模生物多样性训练的大语言模型,展示了利用人工智能设计的可编程基因编辑器对人类基因组进行成功的精准编辑。为实现这一目标,我们通过系统挖掘26万亿碱基的组装基因组和宏基因组,精心策划了一个包含超过100万个CRISPR操纵子的数据集。我们通过生成自然界中发现的CRISPR-Cas家族中蛋白质簇数量4.8倍的数量,并为类Cas9效应蛋白定制单导向RNA序列,展示了我们模型的能力。与典型的基因编辑效应蛋白SpCas9相比,生成的几种基因编辑器表现出相当或更高的活性和特异性,同时在序列上相差400个突变。最后,我们证明了一种由人工智能生成的基因编辑器OpenCRISPR-1与碱基编辑具有兼容性。我们发布OpenCRISPR-1以促进其在研究和商业应用中的广泛、符合伦理的使用。