Sánchez Joaquín
Facultad de Medicina, UAEM, Calle Ixtaccihuatl Esq Leñeros, Col. Los Volcanes C.P. 62350, Cuernavaca, Morelos, Mexico.
Bioinformation. 2013 May 8;9(10):511-7. doi: 10.6026/97320630009511. Print 2013.
The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression.
尚未有关于使用编码相同肽段的序列(SEIP)对不同物种的编码DNA进行计算机分析的报道;对这类序列的研究可以直接揭示独立于肽段序列的编码DNA的特性。出于实际应用目的,SEIP也可用于例如异源蛋白表达等方面。我们从人类和大肠杆菌中提取了1551个SEIP,从人类和黑腹果蝇中提取了2631个SEIP。然后我们分析了密码子使用情况和密码子间二核苷酸倾向,发现两者均存在差异,人类和大肠杆菌之间的差异比人类和黑腹果蝇之间的差异更为明显。我们还对SEIP进行了简要操作,以探究它们是否可用于创建新的编码序列。因此,我们尝试通过双密码子交换用大肠杆菌密码子替换人类密码子,但发现无法完全替换,这表明存在强大的物种特异性双密码子倾向。为了测试另一种形式的密码子替换,我们从人类和水母绿色荧光蛋白(GFP)中分离出SEIP,然后用人类四肽编码序列重建GFP编码DNA。结果提供了原理证明,即SEIP可用于揭示编码DNA特性的差异,并利用来自不同生物体的序列分段重建蛋白质编码DNA,后者可用于异源蛋白表达。