Cheng Feng, Chen Wei, Richards Elliott, Deng Libin, Zeng Changqing
Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, PR China.
BMC Evol Biol. 2009 Sep 5;9:221. doi: 10.1186/1471-2148-9-221.
Positive selection is a driving force that has shaped the modern human. Recent developments in high throughput technologies and corresponding statistics tools have made it possible to conduct whole genome surveys at a population scale, and a variety of measurements, such as heterozygosity (HET), FST, and Tajima's D, have been applied to multiple datasets to identify signals of positive selection. However, great effort has been required to combine various types of data from individual sources, and incompatibility among datasets has been a common problem. SNP@Evolution, a new database which integrates multiple datasets, will greatly assist future work in this area.
As part of our research scanning for evolutionary signals in HapMap Phase II and Phase III datasets, we built SNP@Evolution as a multi-aspect database focused on positive selection. Among its many features, SNP@Evolution provides computed FST and HET of all HapMap SNPs, 5+ HapMap SNPs per qualified gene, and all autosome regions detected from whole genome window scanning. In an attempt to capture multiple selection signals across the genome, selection-signal enrichment strength (ES) values of HET, FST, and P-values of iHS of most annotated genes have been calculated and integrated within one frame for users to search for outliers. Genes with significant ES or P-values (with thresholds of 0.95 and 0.05, respectively) have been highlighted in color. Low diversity chromosome regions have been detected by sliding a 100 kb window in a 10 kb step. To allow this information to be easily disseminated, a graphical user interface (GBrowser) was constructed with the Generic Model Organism Database toolkit.
Available at http://bighapmap.big.ac.cn, SNP@Evolution is a hierarchical database focused on positive selection of the human genome. Based on HapMap Phase II and III data, SNP@Evolution includes 3,619,226/1,389,498 SNPs with their computed HET and FST, as well as qualified genes of 21,859/21,099 with ES values of HET and FST. In at least one HapMap population group, window scanning for selection signals has resulted in 1,606/10,138 large low HET regions. Among Phase II and III geographical groups, 660 and 464 regions show strong differentiation.
正向选择是塑造现代人类的一股驱动力。高通量技术及相应统计工具的最新发展使得在群体规模上进行全基因组调查成为可能,多种测量方法,如杂合度(HET)、FST和 Tajima's D,已被应用于多个数据集以识别正向选择信号。然而,整合来自各个来源的不同类型数据需要付出巨大努力,并且数据集之间的不兼容性是一个常见问题。SNP@Evolution是一个整合了多个数据集的新数据库,将极大地助力该领域未来的工作。
作为我们在HapMap二期和三期数据集中扫描进化信号研究的一部分,我们构建了SNP@Evolution作为一个专注于正向选择的多方面数据库。在其众多特性中,SNP@Evolution提供了所有HapMap SNP的计算FST和HET、每个合格基因5个以上的HapMap SNP以及通过全基因组窗口扫描检测到的所有常染色体区域。为了捕捉全基因组的多个选择信号,已计算了大多数注释基因的HET、FST的选择信号富集强度(ES)值以及iHS的P值,并将其整合在一个框架内供用户搜索异常值。具有显著ES或P值(阈值分别为0.95和0.05)的基因已用颜色突出显示。通过以10 kb步长滑动100 kb窗口检测到低多样性染色体区域。为便于传播这些信息,使用通用模式生物数据库工具包构建了一个图形用户界面(GBrowser)。