一种通过自动决策来验证蛋白质表达克隆序列的新方法。

A novel approach to sequence validating protein expression clones with automated decision making.

作者信息

Taycher Elena, Rolfs Andreas, Hu Yanhui, Zuo Dongmei, Mohr Stephanie E, Williamson Janice, Labaer Joshua

机构信息

Harvard Institute of Proteomics, Harvard Medical School, Cambridge, MA 02141, USA.

出版信息

BMC Bioinformatics. 2007 Jun 13;8:198. doi: 10.1186/1471-2105-8-198.

DOI:10.1186/1471-2105-8-198

PMID:17567908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1914086/

Abstract

BACKGROUND

Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation.

RESULTS

We have developed an Automated Clone Evaluation (ACE) system - the first comprehensive, multi-platform, web-based plasmid sequence verification software package. ACE automates the clone verification process by defining each clone sequence as a list of multidimensional discrepancy objects, each describing a difference between the clone and its expected sequence including the resulting polypeptide consequences. To evaluate clones automatically, this list can be compared against user acceptance criteria that specify the allowable number of discrepancies of each type. This strategy allows users to re-evaluate the same set of clones against different acceptance criteria as needed for use in other experiments. ACE manages the entire sequence validation process including contig management, identifying and annotating discrepancies, determining if discrepancies correspond to polymorphisms and clone finishing. Designed to manage thousands of clones simultaneously, ACE maintains a relational database to store information about clones at various completion stages, project processing parameters and acceptance criteria. In a direct comparison, the automated analysis by ACE took less time and was more accurate than a manual analysis of a 93 gene clone set.

CONCLUSION

ACE was designed to facilitate high throughput clone sequence verification projects. The software has been used successfully to evaluate more than 55,000 clones at the Harvard Institute of Proteomics. The software dramatically reduced the amount of time and labor required to evaluate clone sequences and decreased the number of missed sequence discrepancies, which commonly occur during manual evaluation. In addition, ACE helped to reduce the number of sequencing reads needed to achieve adequate coverage for making decisions on clones.

摘要

背景

蛋白质表达克隆的分子组装易于自动化且通常能高通量完成，然而这些克隆的序列验证仍主要依靠人工进行，这是一个艰巨且耗时的过程。验证的最终目标是确定给定的质粒克隆与其参考序列的匹配程度是否足以在蛋白质表达实验中“被接受”。鉴于数以万计未经验证的克隆数量不断加速增加，对快速、高效且准确的自动化克隆验证软件的需求极为迫切。

结果

我们开发了一个自动克隆评估（ACE）系统——首个全面、多平台、基于网络的质粒序列验证软件包。ACE通过将每个克隆序列定义为多维差异对象列表来实现克隆验证过程的自动化，每个差异对象描述了克隆与其预期序列之间的差异，包括由此产生的多肽结果。为了自动评估克隆，可以将此列表与用户接受标准进行比较，该标准规定了每种类型差异的允许数量。这种策略允许用户根据其他实验的需要，针对不同的接受标准重新评估同一组克隆。ACE管理整个序列验证过程，包括重叠群管理、识别和注释差异、确定差异是否对应于多态性以及克隆完成情况。ACE旨在同时管理数千个克隆，它维护一个关系数据库来存储有关处于不同完成阶段的克隆、项目处理参数和接受标准的信息。在直接比较中，ACE的自动分析比人工分析一个93个基因的克隆集花费的时间更少且更准确。

结论

ACE旨在促进高通量克隆序列验证项目。该软件已在哈佛蛋白质组学研究所成功用于评估超过55,000个克隆。该软件极大地减少了评估克隆序列所需的时间和人力，并减少了人工评估过程中常见的序列差异遗漏数量。此外，ACE有助于减少为对克隆做出决策而获得足够覆盖所需的测序读数数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb3e/1914086/7dddb4a18339/1471-2105-8-198-1.jpg

相似文献

A novel approach to sequence validating protein expression clones with automated decision making.一种通过自动决策来验证蛋白质表达克隆序列的新方法。

BMC Bioinformatics. 2007 Jun 13;8:198. doi: 10.1186/1471-2105-8-198.

DNA polymorphism detector: an automated tool that searches for allelic matches in public databases for discrepancies found in clone or cDNA sequences.DNA多态性检测器：一种自动化工具，用于在公共数据库中搜索等位基因匹配项，以查找克隆或cDNA序列中发现的差异。

Bioinformatics. 2005 May 1;21(9):2133-5. doi: 10.1093/bioinformatics/bti298. Epub 2005 Feb 2.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

CATO: The Clone Alignment Tool.CATO：克隆比对工具。

PLoS One. 2016 Jul 26;11(7):e0159586. doi: 10.1371/journal.pone.0159586. eCollection 2016.

Automated finishing with autofinish.使用自动完成功能进行自动整理。

Genome Res. 2001 Apr;11(4):614-25. doi: 10.1101/gr.171401.

Sequence verification of synthetic DNA by assembly of sequencing reads.通过测序读取的组装进行合成 DNA 的序列验证。

Nucleic Acids Res. 2013 Jan 7;41(1):e25. doi: 10.1093/nar/gks908. Epub 2012 Oct 4.

Combinatorial assembly of clone libraries using site-specific recombination.使用位点特异性重组对克隆文库进行组合组装。

Methods Mol Biol. 2014;1116:193-208. doi: 10.1007/978-1-62703-764-8_14.

WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data.WebPrInSeS：使用高通量测序数据进行自动化全长克隆序列鉴定和验证。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W378-84. doi: 10.1093/nar/gkq431. Epub 2010 May 25.

PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data.PET-Tool：一个用于对双末端二标签（PET）序列数据进行综合处理与管理的软件套件。

BMC Bioinformatics. 2006 Aug 25;7:390. doi: 10.1186/1471-2105-7-390.

SNP-PHAGE--High throughput SNP discovery pipeline.SNP-噬菌体——高通量单核苷酸多态性发现流程

BMC Bioinformatics. 2006 Oct 23;7:468. doi: 10.1186/1471-2105-7-468.

引用本文的文献

Exploration of panviral proteome: high-throughput cloning and functional implications in virus-host interactions.泛病毒蛋白质组的探索：高通量克隆及其在病毒-宿主相互作用中的功能意义

Theranostics. 2014 Jun 6;4(8):808-22. doi: 10.7150/thno.8255. eCollection 2014.

High-throughput cloning and expression library creation for functional proteomics.高通量克隆和表达文库构建用于功能蛋白质组学。

Proteomics. 2013 May;13(9):1381-99. doi: 10.1002/pmic.201200456. Epub 2013 Apr 5.

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W378-84. doi: 10.1093/nar/gkq431. Epub 2010 May 25.

Genome-wide study of Pseudomonas aeruginosa outer membrane protein immunogenicity using self-assembling protein microarrays.使用自组装蛋白质微阵列对铜绿假单胞菌外膜蛋白免疫原性进行全基因组研究。

Infect Immun. 2009 Nov;77(11):4877-86. doi: 10.1128/IAI.00698-09. Epub 2009 Sep 8.

Production and sequence validation of a complete full length ORF collection for the pathogenic bacterium Vibrio cholerae.霍乱弧菌致病细菌完整全长开放阅读框文库的构建及序列验证

Proc Natl Acad Sci U S A. 2008 Mar 18;105(11):4364-9. doi: 10.1073/pnas.0712049105. Epub 2008 Mar 12.

A biomedically enriched collection of 7000 human ORF clones.一个包含7000个人类开放阅读框克隆的生物医学富集文库。

PLoS One. 2008 Jan 30;3(1):e1528. doi: 10.1371/journal.pone.0001528.

本文引用的文献

PlasmID: a centralized repository for plasmid clone information and distribution.质粒ID：一个用于质粒克隆信息和分发的集中式存储库。

Nucleic Acids Res. 2007 Jan;35(Database issue):D680-4. doi: 10.1093/nar/gkl898. Epub 2006 Nov 28.

ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe.粟酒裂殖酵母的开放阅读框（ORF）文库构建及蛋白质定位的全局分析

Nat Biotechnol. 2006 Jul;24(7):841-7. doi: 10.1038/nbt1222. Epub 2006 Jun 25.

From genome to proteome: developing expression clone resources for the human genome.从基因组到蛋白质组：开发人类基因组的表达克隆资源。

Hum Mol Genet. 2006 Apr 15;15 Spec No 1:R31-43. doi: 10.1093/hmg/ddl048.

MAGIC-SPP: a database-driven DNA sequence processing package with associated management tools.MAGIC-SPP：一个由数据库驱动的DNA序列处理软件包及相关管理工具。

BMC Bioinformatics. 2006 Mar 7;7:115. doi: 10.1186/1471-2105-7-115.

SNPdetector: a software tool for sensitive and accurate SNP detection.SNPdetector：一款用于灵敏且准确地检测单核苷酸多态性的软件工具。

PLoS Comput Biol. 2005 Oct;1(5):e53. doi: 10.1371/journal.pcbi.0010053. Epub 2005 Oct 28.

Building a human kinase gene repository: bioinformatics, molecular cloning, and functional validation.构建人类激酶基因库：生物信息学、分子克隆及功能验证

Proc Natl Acad Sci U S A. 2005 Jun 7;102(23):8114-9. doi: 10.1073/pnas.0503141102. Epub 2005 May 31.

novoSNP, a novel computational tool for sequence variation discovery.novoSNP，一种用于发现序列变异的新型计算工具。

Genome Res. 2005 Mar;15(3):436-42. doi: 10.1101/gr.2754005.

Bioinformatics. 2005 May 1;21(9):2133-5. doi: 10.1093/bioinformatics/bti298. Epub 2005 Feb 2.

ESTAP--an automated system for the analysis of EST data.ESTAP——一种用于分析EST数据的自动化系统。

Bioinformatics. 2003 Sep 1;19(13):1720-2. doi: 10.1093/bioinformatics/btg205.

ESTIMA, a tool for EST management in a multi-project environment.ESTIMA，一种用于多项目环境中EST管理的工具。

BMC Bioinformatics. 2004 Nov 4;5:176. doi: 10.1186/1471-2105-5-176.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种通过自动决策来验证蛋白质表达克隆序列的新方法。

A novel approach to sequence validating protein expression clones with automated decision making.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献