Karlin S, Ghandour G, Foulser D E
Department of Mathematics, Stanford University.
Mol Biol Evol. 1985 Jan;2(1):35-52. doi: 10.1093/oxfordjournals.molbev.a040336.
A comparative analysis between human, mouse, and rabbit immunoglobulin (Ig) kappa-gene DNA sequences is presented. New formulas for determining the expected length and variance of the longest block identity (a succession of matching nucleotides) between multiple random sequences are given and are used to establish statistical criteria for ascertaining the significance of block identities shared in r out of s sequences. The statistically significant block identities within and between the Ig-kappa-gene sequences are ascertained, and alignment maps based on these similarities are constructed. The human and rabbit sequences (especially in the noncoding regions) and the human and mouse sequences (on the coding regions) show a similarity much stronger than that between the mouse and rabbit sequences. The existence of several highly significant shared oligonucleotides occurring in alignment with each other or with respect to the J- and C-gene segments suggests a configuration of multiple control sites. Discussion and interpretations of the form and distribution of the block identities are given.
本文对人、小鼠和兔免疫球蛋白(Ig)κ基因的DNA序列进行了比较分析。给出了用于确定多个随机序列之间最长连续相同片段(一连串匹配核苷酸)的预期长度和方差的新公式,并用于建立统计标准,以确定在s个序列中的r个序列中共享的连续相同片段的显著性。确定了Ig-κ基因序列内部和之间具有统计学显著性的连续相同片段,并基于这些相似性构建了比对图谱。人源和兔源序列(特别是在非编码区)以及人源和小鼠源序列(在编码区)之间的相似性比小鼠和兔序列之间的相似性要强得多。存在几个高度显著的共享寡核苷酸,它们彼此对齐或相对于J基因和C基因片段对齐,这表明存在多个控制位点的配置。文中还对连续相同片段的形式和分布进行了讨论和解释。