Kovarík A, Matzke M A, Matzke A J, Koulaková B
Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno.
Mol Genet Genomics. 2001 Oct;266(2):216-22. doi: 10.1007/s004380100542.
During recloning of Nicotiana tabacum L. repetitive sequence R8.3 in Escherichia coli, a modified clone that differed from the original by the insertion of an IS10 sequence was unintentionally produced. The insert was flanked by a 9-bp direct repeat derived from the R8.3 sequence, the 9-bp duplication of acceptor DNA in the site of insertion being a characteristic of IS10 transposition events. A database search using the FASTA program showed IS10 and other prokaryotic IS elements inserted into numerous eukaryotic clones. Unexpectedly, the IS10, which is not a natural component of the E. coli genome, appeared to be by far the most frequent contaminant of DNA databases among several IS sequences tested. In the GenEMBL database, the IS10 query sequence yielded positive scores with more than 500 eukaryotic clones. Insertions of shortened IS10 sequences having only one intact terminal inverted repeat were commonly found. Most full-length IS10 insertions (32 out of 40 analyzed) were flanked by 9-bp direct repeats having the consensus 5'-NPuCNN-NGPyN-3' with a strong preference for 5'-TGCTNA-GNN-3'. One insertion was flanked by an inverted repeat of more than 400 bp in length. PCR amplification and Southern analysis revealed the presence of IS10 sequences in E. coli strains commonly used for DNA cloning, including some reported to be Tn10-free. No IS10-specific PCR product was obtained with N. tabacum or human DNA. Our data suggest that transposition of IS10 elements may accompany cloning steps, particularly into large BAC vectors. This might lead to the relatively frequent contamination of DNA databases by this bacterial sequence. It is estimated that one in approximately every thousand eukaryotic clone in the databases is contaminated by IS-derived sequences. We recommend checking submitted sequences for the presence of IS10 and other IS elements. In addition, DNA databases should be corrected by removing contaminating IS sequences.
在烟草(Nicotiana tabacum L.)重复序列R8.3于大肠杆菌中进行再克隆的过程中,无意间产生了一个与原始克隆不同的修饰克隆,其差异在于插入了一段IS10序列。该插入片段两侧是源自R8.3序列的9个碱基对的正向重复序列,插入位点处受体DNA的9个碱基对重复是IS10转座事件的一个特征。使用FASTA程序进行的数据库搜索显示,IS10和其他原核IS元件插入了众多真核克隆中。出乎意料的是,IS10并非大肠杆菌基因组的天然组成部分,在测试的多个IS序列中,它似乎是DNA数据库中迄今为止最常见的污染物。在GenEMBL数据库中,IS10查询序列与500多个真核克隆产生了阳性得分。通常会发现仅具有一个完整末端反向重复序列的缩短IS10序列的插入。大多数全长IS10插入(40个分析样本中有32个)两侧是具有5'-NPuCNN-NGPyN-3'共有序列的9个碱基对正向重复序列,强烈偏好5'-TGCTNA-GNN-3'。有一个插入片段两侧是长度超过400个碱基对的反向重复序列。PCR扩增和Southern分析揭示了常用于DNA克隆的大肠杆菌菌株中存在IS10序列,包括一些据报道不含Tn10的菌株。用烟草或人类DNA未获得IS10特异性PCR产物。我们的数据表明,IS10元件的转座可能伴随克隆步骤,特别是在插入大型BAC载体时。这可能导致该细菌序列相对频繁地污染DNA数据库。据估计,数据库中大约每一千个真核克隆中就有一个被IS衍生序列污染。我们建议检查提交的序列中是否存在IS10和其他IS元件。此外,应通过去除污染的IS序列来校正DNA数据库。