Go M
Adv Biophys. 1985;19:91-131. doi: 10.1016/0065-227x(85)90052-8.
Exon-intron structures of eukaryotic genes were examined closely in their relation to primary and tertiary structures of the proteins they encode. Specific attention was given to the introns of genes encoding proteins having no repeats in their amino acid sequences. such introns have been shown to be located at sites corresponding to inter-domain or inter-module junctions of proteins identified in their three dimensional structures. "Modules," compact structural units in globular domains of proteins, are identified by drawing a distance map. Intron positions are found to correspond to intermodule junctions in various proteins whose X-ray crystallographic data are available: the globin family, CEWL, ovomucoid, cytochrome c, ADH, and trypsin-like serine proteinases. The good correspondence between intron positions and intermodule junctions excludes a mechanism of random insertion of introns, because the probability of intron insertion at each intermodule junction is extraordinarily small. Intron positions have been very stable and well conserved during evolution. However, at some inter-module junctions no introns are found. Modules in small proteins having no core modules buried in their interior have a character suitable for recruitment through their assembly into a stable domain; one side of them is rich in hydrophobic residues and the other in hydrophilic residues. Functionally important residues are scattered on different modules in the proteins examined. Based on these observations, the role of modules in the precellular period was conjectured: some of them might be functionally active by themselves but most modules might be only segments who could function as an active protein only in an assembly. The origin of introns might be traced back prior to the divergence of prokaryotes and eukaryotes.