Woldring Daniel R, Holec Patrick V, Hackel Benjamin J
Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota, 55455.
Proteins. 2016 Jul;84(7):869-74. doi: 10.1002/prot.25040. Epub 2016 Apr 16.
ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc.
ScaffoldSeq是一款为多种应用而设计的软件,包括定向进化分析。在这些应用中,用户生成一群编码具有相关功能的部分多样化蛋白质的DNA序列,并希望表征整个群体中单个位点和成对氨基酸的频率。酶成熟、抗体筛选和替代支架工程的常见场景涉及原始群体和进化群体,这些群体在保守框架内包含序列和长度都不同的多样化区域。高通量测序平台有助于分析此类群体的多样化区域;然而,这些区域内的长度变异性(例如抗体互补决定区)会妨碍比对过程。为了克服这一挑战,ScaffoldSeq算法利用保守框架序列快速识别多样化区域。除此之外,在DNA测序之前进化和分离感兴趣的克隆所需的整个实验工作流程中会产生序列频率的意外偏差。ScaffoldSeq软件通过提供量化和去除背景序列、聚类相似蛋白质家族以及减弱优势克隆影响的工具,独特地处理了这个问题。该软件为每个感兴趣的区域生成图形和表格摘要,允许用户以位点特异性方式评估多样性,并识别上位性成对相互作用。代码和详细信息可在http://research.cems.umn.edu/hackel免费获取。《蛋白质》2016年;84:869 - 874。© 2016威利期刊公司。