State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China.
Appl Environ Microbiol. 2012 Jul;78(14):4795-801. doi: 10.1128/AEM.00340-12. Epub 2012 Apr 27.
We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.
我们设计了一种高通量系统,用于从苏云金芽孢杆菌菌株中鉴定新的晶体蛋白基因(cry)。该系统的开发有两个目标:(i)使用下一代测序生物技术获得苏云金芽孢杆菌的混合质粒富集基因组序列,(ii)使用计算管道(BtToxin_scanner)鉴定 cry 基因。在我们的管道方法中,我们采用了三种不同的成熟预测方法,即 BLAST、隐马尔可夫模型(HMM)和支持向量机(SVM),来预测 Cry 毒素基因的存在。该管道被证明速度很快(平均速度为 1.02Mb/min 的蛋白质和开放阅读框[ORFs]和 1.80Mb/min 的核苷酸序列),灵敏度高(与使用从 GenBank 下载的基因组序列的关键字提取方法相比,检测到的蛋白毒素基因多 40%),特异性高。根据质粒模式和/或晶体形态,从我们实验室的收集物中选择了 21 株。从这些菌株中提取质粒富集的基因组 DNA 并混合进行 Illumina 测序。对测序数据进行从头组装,使用计算管道共鉴定出 113 个候选 cry 序列。根据与已知 cry 基因的序列同一性低,选择了 27 个候选序列,并通过 PCR 获得了 8 个全长基因。最后,鉴定出三个新的 cry 型基因(一级等级)和五个 cry 原型,它们被苏云金芽孢杆菌毒素命名委员会分别命名为 cry8Ac1、cry7Ha1、cry21Ca1、cry32Fa1 和 cry21Da1。这里描述的系统既高效又具有成本效益,可以大大加速新的 cry 基因的发现。