Taycher Elena, Rolfs Andreas, Hu Yanhui, Zuo Dongmei, Mohr Stephanie E, Williamson Janice, Labaer Joshua
Harvard Institute of Proteomics, Harvard Medical School, Cambridge, MA 02141, USA.
BMC Bioinformatics. 2007 Jun 13;8:198. doi: 10.1186/1471-2105-8-198.
Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation.
We have developed an Automated Clone Evaluation (ACE) system - the first comprehensive, multi-platform, web-based plasmid sequence verification software package. ACE automates the clone verification process by defining each clone sequence as a list of multidimensional discrepancy objects, each describing a difference between the clone and its expected sequence including the resulting polypeptide consequences. To evaluate clones automatically, this list can be compared against user acceptance criteria that specify the allowable number of discrepancies of each type. This strategy allows users to re-evaluate the same set of clones against different acceptance criteria as needed for use in other experiments. ACE manages the entire sequence validation process including contig management, identifying and annotating discrepancies, determining if discrepancies correspond to polymorphisms and clone finishing. Designed to manage thousands of clones simultaneously, ACE maintains a relational database to store information about clones at various completion stages, project processing parameters and acceptance criteria. In a direct comparison, the automated analysis by ACE took less time and was more accurate than a manual analysis of a 93 gene clone set.
ACE was designed to facilitate high throughput clone sequence verification projects. The software has been used successfully to evaluate more than 55,000 clones at the Harvard Institute of Proteomics. The software dramatically reduced the amount of time and labor required to evaluate clone sequences and decreased the number of missed sequence discrepancies, which commonly occur during manual evaluation. In addition, ACE helped to reduce the number of sequencing reads needed to achieve adequate coverage for making decisions on clones.
蛋白质表达克隆的分子组装易于自动化且通常能高通量完成,然而这些克隆的序列验证仍主要依靠人工进行,这是一个艰巨且耗时的过程。验证的最终目标是确定给定的质粒克隆与其参考序列的匹配程度是否足以在蛋白质表达实验中“被接受”。鉴于数以万计未经验证的克隆数量不断加速增加,对快速、高效且准确的自动化克隆验证软件的需求极为迫切。
我们开发了一个自动克隆评估(ACE)系统——首个全面、多平台、基于网络的质粒序列验证软件包。ACE通过将每个克隆序列定义为多维差异对象列表来实现克隆验证过程的自动化,每个差异对象描述了克隆与其预期序列之间的差异,包括由此产生的多肽结果。为了自动评估克隆,可以将此列表与用户接受标准进行比较,该标准规定了每种类型差异的允许数量。这种策略允许用户根据其他实验的需要,针对不同的接受标准重新评估同一组克隆。ACE管理整个序列验证过程,包括重叠群管理、识别和注释差异、确定差异是否对应于多态性以及克隆完成情况。ACE旨在同时管理数千个克隆,它维护一个关系数据库来存储有关处于不同完成阶段的克隆、项目处理参数和接受标准的信息。在直接比较中,ACE的自动分析比人工分析一个93个基因的克隆集花费的时间更少且更准确。
ACE旨在促进高通量克隆序列验证项目。该软件已在哈佛蛋白质组学研究所成功用于评估超过55,000个克隆。该软件极大地减少了评估克隆序列所需的时间和人力,并减少了人工评估过程中常见的序列差异遗漏数量。此外,ACE有助于减少为对克隆做出决策而获得足够覆盖所需的测序读数数量。