McCauley Jacob L, Kenealy Shannon J, Margulies Elliott H, Schnetz-Boutaud Nathalie, Gregory Simon G, Hauser Stephen L, Oksenberg Jorge R, Pericak-Vance Margaret A, Haines Jonathan L, Mortlock Douglas P
Center for Human Genetics Research and Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, USA.
BMC Genomics. 2007 Aug 6;8:266. doi: 10.1186/1471-2164-8-266.
Although genes play a key role in many complex diseases, the specific genes involved in most complex diseases remain largely unidentified. Their discovery will hinge on the identification of key sequence variants that are conclusively associated with disease. While much attention has been focused on variants in protein-coding DNA, variants in noncoding regions may also play many important roles in complex disease by altering gene regulation. Since the vast majority of noncoding genomic sequence is of unknown function, this increases the challenge of identifying "functional" variants that cause disease. However, evolutionary conservation can be used as a guide to indicate regions of noncoding or coding DNA that are likely to have biological function, and thus may be more likely to harbor SNP variants with functional consequences. To help bias marker selection in favor of such variants, we devised a process that prioritizes annotated SNPs for genotyping studies based on their location within Multi-species Conserved Sequences (MCSs) and used this process to select SNPs in a region of linkage to a complex disease. This allowed us to evaluate the utility of the chosen SNPs for further association studies. Previously, a region of chromosome 1q43 was linked to Multiple Sclerosis (MS) in a genome-wide screen. We chose annotated SNPs in the region based on location within MCSs (termed MCS-SNPs). We then obtained genotypes for 478 MCS-SNPs in 989 individuals from MS families.
Analysis of our MCS-SNP genotypes from the 1q43 region and comparison to HapMap data confirmed that annotated SNPs in MCS regions are frequently polymorphic and show subtle signatures of selective pressure, consistent with previous reports of genome-wide variation in conserved regions. We also present an online tool that allows MCS data to be directly exported to the UCSC genome browser so that MCS-SNPs can be easily identified within genomic regions of interest.
Our results showed that MCS can easily be used to prioritize markers for follow-up and candidate gene association studies. We believe that this novel approach demonstrates a paradigm for expediting the search for genes contributing to complex diseases.
尽管基因在许多复杂疾病中起着关键作用,但大多数复杂疾病所涉及的具体基因在很大程度上仍未明确。它们的发现将取决于对与疾病有确凿关联的关键序列变异的识别。虽然很多注意力都集中在蛋白质编码DNA中的变异上,但非编码区域的变异也可能通过改变基因调控在复杂疾病中发挥许多重要作用。由于绝大多数非编码基因组序列的功能未知,这增加了识别导致疾病的“功能性”变异的挑战。然而,进化保守性可作为一种指导,指示非编码或编码DNA中可能具有生物学功能的区域,因此可能更有可能含有具有功能后果的单核苷酸多态性(SNP)变异。为了有助于偏向选择有利于此类变异的标记,我们设计了一个流程,该流程根据注释的SNP在多物种保守序列(MCS)中的位置对基因分型研究进行优先排序,并使用此流程在与复杂疾病连锁的区域中选择SNP。这使我们能够评估所选SNP用于进一步关联研究的效用。此前,在全基因组筛查中,1号染色体1q43区域与多发性硬化症(MS)相关联。我们根据MCS中的位置在该区域选择了注释的SNP(称为MCS-SNP)。然后我们获得了来自MS家族的989名个体中478个MCS-SNP的基因型。
对我们来自1q43区域的MCS-SNP基因型进行分析并与HapMap数据进行比较,证实MCS区域中的注释SNP经常是多态的,并显示出微妙的选择压力特征,这与之前关于保守区域全基因组变异的报道一致。我们还提供了一个在线工具,可将MCS数据直接导出到UCSC基因组浏览器,以便在感兴趣的基因组区域内轻松识别MCS-SNP。
我们的结果表明,MCS可轻松用于对后续研究和候选基因关联研究的标记进行优先排序。我们相信,这种新方法展示了一种加快寻找导致复杂疾病的基因的范例。