Shen Yuning, Pressman Abe, Janzen Evan, Chen Irene A
Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA 93106, USA.
Department of Chemical Engineering, University of California, Santa Barbara, CA 93106, USA.
Nucleic Acids Res. 2021 Jul 9;49(12):e67. doi: 10.1093/nar/gkab199.
Characterizing genotype-phenotype relationships of biomolecules (e.g. ribozymes) requires accurate ways to measure activity for a large set of molecules. Kinetic measurement using high-throughput sequencing (e.g. k-Seq) is an emerging assay applicable in various domains that potentially scales up measurement throughput to over 106 unique nucleic acid sequences. However, maximizing the return of such assays requires understanding the technical challenges introduced by sequence heterogeneity and DNA sequencing. We characterized the k-Seq method in terms of model identifiability, effects of sequencing error, accuracy and precision using simulated datasets and experimental data from a variant pool constructed from previously identified ribozymes. Relative abundance, kinetic coefficients, and measurement noise were found to affect the measurement of each sequence. We introduced bootstrapping to robustly quantify the uncertainty in estimating model parameters and proposed interpretable metrics to quantify model identifiability. These efforts enabled the rigorous reporting of data quality for individual sequences in k-Seq experiments. Here we present detailed protocols, define critical experimental factors, and identify general guidelines to maximize the number of sequences and their measurement accuracy from k-Seq data. Analogous practices could be applied to improve the rigor of other sequencing-based assays.
表征生物分子(如核酶)的基因型-表型关系需要准确的方法来测量大量分子的活性。使用高通量测序进行动力学测量(如k-Seq)是一种新兴的检测方法,适用于各个领域,有可能将测量通量扩大到超过106个独特的核酸序列。然而,要使此类检测的回报最大化,需要了解由序列异质性和DNA测序带来的技术挑战。我们使用模拟数据集以及从先前鉴定的核酶构建的变体库中的实验数据,从模型可识别性、测序误差的影响、准确性和精密度等方面对k-Seq方法进行了表征。发现相对丰度、动力学系数和测量噪声会影响每个序列的测量。我们引入了自助法来稳健地量化估计模型参数时的不确定性,并提出了可解释的指标来量化模型可识别性。这些工作使得能够严格报告k-Seq实验中各个序列的数据质量。在此,我们展示详细的方案,定义关键实验因素,并确定通用指南,以从k-Seq数据中最大化序列数量及其测量准确性。类似的做法可用于提高其他基于测序的检测的严谨性。