Bradley Michael E, Benner Steven A
Department of Chemistry, University of Florida P,O, Box 117200, Gainesville, FL 32611-7200, USA.
BMC Evol Biol. 2005 Mar 7;5:22. doi: 10.1186/1471-2148-5-22.
Blocks of duplicated genomic DNA sequence longer than 1000 base pairs are known as low copy repeats (LCRs). Identified by their sequence similarity, LCRs are abundant in the human genome, and are interesting because they may represent recent adaptive events, or potential future adaptive opportunities within the human lineage. Sequence analysis tools are needed, however, to decide whether these interpretations are likely, whether a particular set of LCRs represents nearly neutral drift creating junk DNA, or whether the appearance of LCRs reflects assembly error. Here we investigate an LCR family containing the sulfotransferase (SULT) 1A genes involved in drug metabolism, cancer, hormone regulation, and neurotransmitter biology as a first step for defining the problems that those tools must manage.
Sequence analysis here identified a fourth sulfotransferase gene, which may be transcriptionally active, located on human chromosome 16. Four regions of genomic sequence containing the four human SULT1A paralogs defined a new LCR family. The stem hominoid SULT1A progenitor locus was identified by comparative genomics involving complete human and rodent genomes, and a draft chimpanzee genome. SULT1A expansion in hominoid genomes was followed by positive selection acting on specific protein sites. This episode of adaptive evolution appears to be responsible for the dopamine sulfonation function of some SULT enzymes. Each of the conclusions that this bioinformatic analysis generated using data that has uncertain reliability (such as that from the chimpanzee genome sequencing project) has been confirmed experimentally or by a "finished" chromosome 16 assembly, both of which were published after the submission of this manuscript.
SULT1A genes expanded from one to four copies in hominoids during intra-chromosomal LCR duplications, including (apparently) one after the divergence of chimpanzees and humans. Thus, LCRs may provide a means for amplifying genes (and other genetic elements) that are adaptively useful. Being located on and among LCRs, however, could make the human SULT1A genes susceptible to further duplications or deletions resulting in 'genomic diseases' for some individuals. Pharmacogenomic studies of SULT1Asingle nucleotide polymorphisms, therefore, should also consider examining SULT1A copy number variability when searching for genotype-phenotype associations. The latest duplication is, however, only a substantiated hypothesis; an alternative explanation, disfavored by the majority of evidence, is that the duplication is an artifact of incorrect genome assembly.
长度超过1000个碱基对的重复基因组DNA序列片段被称为低拷贝重复序列(LCRs)。通过序列相似性识别,LCRs在人类基因组中大量存在,并且很有趣,因为它们可能代表了人类谱系中近期的适应性事件或潜在的未来适应性机会。然而,需要序列分析工具来确定这些解释是否合理,一组特定的LCRs是否代表几乎中性的漂变从而产生垃圾DNA,或者LCRs的出现是否反映了组装错误。在这里,我们研究一个包含参与药物代谢、癌症、激素调节和神经递质生物学的磺基转移酶(SULT)1A基因的LCR家族,作为定义这些工具必须处理的问题的第一步。
这里的序列分析确定了一个可能具有转录活性的第四磺基转移酶基因,位于人类16号染色体上。包含四个人类SULT1A旁系同源基因的四个基因组序列区域定义了一个新的LCR家族。通过涉及完整人类和啮齿动物基因组以及黑猩猩基因组草图的比较基因组学,确定了类人猿SULT1A祖源基因座。类人猿基因组中SULT1A的扩增之后是对特定蛋白质位点的正选择。这一适应性进化事件似乎是一些SULT酶具有多巴胺磺化功能的原因。使用可靠性不确定的数据(如来自黑猩猩基因组测序项目的数据)进行的这种生物信息学分析得出的每一个结论,都已通过实验或“完成”的16号染色体组装得到证实,这两者都是在本手稿提交后发表的。
在染色体内部LCR重复过程中,类人猿的SULT1A基因从一个拷贝扩增到四个拷贝,包括(显然)在黑猩猩和人类分化之后的一次扩增。因此,LCRs可能为扩增具有适应性用途的基因(和其他遗传元件)提供一种手段。然而,由于人类SULT1A基因位于LCRs上及其之间,可能会使一些个体的这些基因易受进一步的重复或缺失影响,从而导致“基因组疾病”。因此,在寻找SULT1A单核苷酸多态性的基因型-表型关联时,药物基因组学研究也应考虑检查SULT1A拷贝数变异。然而,最新的重复只是一个得到证实的假设;一个被大多数证据所不支持的替代解释是,这种重复是错误基因组组装的假象。