Vinogradov Alexander E
Institute of Cytology, Russian Academy of Sciences, St. Petersburg 194064, Russia.
Genome Res. 2006 Mar;16(3):347-54. doi: 10.1101/gr.4318206. Epub 2006 Feb 3.
Introns are shorter in housekeeping genes than in tissue- or development-specific genes. Differing explanations have been offered for this phenomenon: selection for economy (in housekeeping genes), mutation bias or "genomic design." The large-scale implementation in this present paper of a rigorous local sequence alignment algorithm revealed an unprecedented fraction of evolutionarily conserved DNA in human-mouse introns ( approximately 60% of human and approximately 70% of mouse intron length remained after masking for lineage-specific repeats). The length distributions of both conserved and nonconserved regions are very broad but show peaks close to nucleosomal and di-nucleosomal DNA. Both the fraction of conserved sequence and its absolute length were higher in introns of tissue-specific genes than housekeeping genes. This difference remained after control for between-species identity of the conserved fraction, mutation rate, and GC content. In a more direct control, the product of the conserved sequence fraction and the between-species identity of this fraction (which can be considered to be the fraction of conserved nucleotides) was greater in introns of tissue-specific genes than housekeeping genes. Neither the fraction of intron length covered by repeats nor the balance of small insertions and deletions (indels) can explain the greater length of introns in tissue-specific genes. The length of the conserved intronic DNA in a gene is correlated with the number of functional domains in the protein encoded by that gene. These results suggest that the greater length of introns in tissue-specific genes is not due to selection for economy or mutation bias but instead is related to functional complexity (probably mediated by chromatin condensation), and that the evolution of the bulk of noncoding DNA is not completely neutral.
管家基因中的内含子比组织特异性或发育特异性基因中的内含子短。针对这一现象有不同的解释:经济选择(针对管家基因)、突变偏向或“基因组设计”。本文大规模应用一种严格的局部序列比对算法,揭示了人类和小鼠内含子中进化保守DNA的比例前所未有的高(在去除谱系特异性重复序列后,约60%的人类内含子长度和约70%的小鼠内含子长度得以保留)。保守区域和非保守区域的长度分布都非常广泛,但在靠近核小体和双核小体DNA处出现峰值。组织特异性基因内含子中保守序列的比例及其绝对长度均高于管家基因。在控制了保守序列比例的物种间一致性、突变率和GC含量后,这种差异仍然存在。在更直接的控制中,组织特异性基因内含子中保守序列比例与其物种间一致性的乘积(可视为保守核苷酸的比例)大于管家基因。重复序列覆盖的内含子长度比例以及小插入和缺失(插入缺失)的平衡都无法解释组织特异性基因中内含子更长的现象。基因中保守内含子DNA的长度与该基因编码的蛋白质中的功能域数量相关。这些结果表明,组织特异性基因中内含子更长并非由于经济选择或突变偏向,而是与功能复杂性(可能由染色质凝聚介导)有关,并且大部分非编码DNA的进化并非完全中性。