School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, United Kingdom.
BrisEngBio, School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad277.
The ability to measure the phenotype of millions of different genetic designs using Massively Parallel Reporter Assays (MPRAs) has revolutionized our understanding of genotype-to-phenotype relationships and opened avenues for data-centric approaches to biological design. However, our knowledge of how best to design these costly experiments and the effect that our choices have on the quality of the data produced is lacking.
In this article, we tackle the issues of data quality and experimental design by developing FORECAST, a Python package that supports the accurate simulation of cell-sorting and sequencing-based MPRAs and robust maximum likelihood-based inference of genetic design function from MPRA data. We use FORECAST's capabilities to reveal rules for MPRA experimental design that help ensure accurate genotype-to-phenotype links and show how the simulation of MPRA experiments can help us better understand the limits of prediction accuracy when this data are used for training deep learning-based classifiers. As the scale and scope of MPRAs grows, tools like FORECAST will help ensure we make informed decisions during their development and the most of the data produced.
The FORECAST package is available at: https://gitlab.com/Pierre-Aurelien/forecast. Code for the deep learning analysis performed in this study is available at: https://gitlab.com/Pierre-Aurelien/rebeca.
使用大规模平行报告分析(MPRA)来测量数百万种不同遗传设计的表型的能力,彻底改变了我们对基因型与表型关系的理解,并为以数据为中心的生物设计方法开辟了途径。然而,我们对如何最好地设计这些昂贵的实验以及我们的选择对所产生数据质量的影响的了解还很缺乏。
在本文中,我们通过开发 FORECAST 解决了数据质量和实验设计的问题,这是一个支持准确模拟基于细胞分选和测序的 MPRA 以及从 MPRA 数据中进行稳健最大似然遗传设计功能推断的 Python 包。我们使用 FORECAST 的功能揭示了有助于确保准确的基因型与表型联系的 MPRA 实验设计规则,并展示了如何模拟 MPRA 实验可以帮助我们更好地了解在使用这些数据训练基于深度学习的分类器时,预测精度的局限性。随着 MPRA 的规模和范围的扩大,像 FORECAST 这样的工具将有助于确保我们在其开发过程中做出明智的决策,并充分利用所产生的数据。
FORECAST 包可在以下网址获得:https://gitlab.com/Pierre-Aurelien/forecast。本研究中进行的深度学习分析的代码可在以下网址获得:https://gitlab.com/Pierre-Aurelien/rebeca。