Suppr超能文献

引擎:探索整个人类基因组中的单核苷酸变异。

ENGINES: exploring single nucleotide variation in entire human genomes.

机构信息

Grupo de Medicina Xenómica, CIBERER, Universidade de Santiago de Compostela, Santiago de Compostela, Galicia, Spain.

出版信息

BMC Bioinformatics. 2011 Apr 19;12:105. doi: 10.1186/1471-2105-12-105.

Abstract

BACKGROUND

Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data.

DESCRIPTION

We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen.

CONCLUSIONS

ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php.

摘要

背景

下一代超测序技术开始从整个人类基因组或外显子序列中产生大量数据,因此需要新的软件来呈现和分析这些大量信息。1000 基因组计划最近发布了通过其第一阶段中期分析代表几个人类群体的 629 个完整基因组的原始数据,尽管有某些公共工具可用于探索这些基因组,但迄今为止,还没有工具可以允许对这些数据中记录的变异进行全面的群体分析。

描述

我们开发了一种遗传变异位点探索器,能够从整个基因组中逐个人群检索单核苷酸变异(SNV)的数据,而不会影响未来的可扩展性和灵活性。ENGINES(用于探索 SNV 的整个基因组接口)使用 1000 基因组计划第一阶段的数据来证明其处理大量遗传变异(>73 亿基因型和 2800 万个 SNV)的能力,以及得出对医学和群体遗传学应用感兴趣的摘要统计信息。整个数据集经过预处理并汇总到一个数据集市中,可通过 Web 界面访问。查询系统允许对每个可用的群体样本进行组合和比较,同时可以通过 rs 编号列表、染色体区域或感兴趣的基因进行搜索。可用频率和 FST 过滤器进一步细化查询,而结果可以与其他大规模单核苷酸多态性(SNP)存储库(如 HapMap 或 Perlegen)进行可视化比较。

结论

ENGINES 能够快速全面地访问大规模变异数据存储库。它允许快速浏览整个基因组的变异,同时为每个变异位点提供统计信息,例如等位基因频率、杂合度或遗传分化的 FST 值。从 http://spsmart.cesga.es/engines.php 可以访问生成数据集市的脚本和 Web 界面。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验