Department of Computer Science, University of Oxford, Parks Road, Oxford, OX1 3QD, UK.
BMC Bioinformatics. 2024 Oct 15;25(1):330. doi: 10.1186/s12859-024-05898-0.
Base editing is an enhanced gene editing approach that enables the precise transformation of single nucleotides and has the potential to cure rare diseases. The design process of base editors is labour-intensive and outcomes are not easily predictable. For any clinical use, base editing has to be accurate and efficient. Thus, any bystander mutations have to be minimized. In recent years, computational models to predict base editing outcomes have been developed. However, the overall robustness and performance of those models is limited. One way to improve the performance is to train models on a diverse, feature-rich, and large dataset, which does not exist for the base editing field. Hence, we develop BE-dataHIVE, a mySQL database that covers over 460,000 gRNA target combinations. The current version of BE-dataHIVE consists of data from five studies and is enriched with melting temperatures and energy terms. Furthermore, multiple different data structures for machine learning were computed and are directly available. The database can be accessed via our website https://be-datahive.com/ or API and is therefore suitable for practitioners and machine learning researchers.
碱基编辑是一种增强型基因编辑方法,可实现单核苷酸的精确转换,有潜力治疗罕见病。碱基编辑器的设计过程非常繁琐,结果也不容易预测。要将碱基编辑用于临床,就必须保证其准确性和高效性,因此必须尽量减少任何旁观者突变。近年来,已经开发出用于预测碱基编辑结果的计算模型,但这些模型的整体稳健性和性能有限。提高性能的一种方法是在多样化、特征丰富且大型的数据集上训练模型,但碱基编辑领域并不存在这样的数据集。因此,我们开发了 BE-dataHIVE,这是一个 MySQL 数据库,涵盖了超过 46 万个 gRNA 靶标组合。BE-dataHIVE 的当前版本包含来自五项研究的数据,并丰富了熔解温度和能量项。此外,还计算了多种不同的机器学习数据结构,并且可以直接使用。该数据库可以通过我们的网站 https://be-datahive.com/ 或 API 访问,因此适合从业者和机器学习研究人员使用。