Department of Public Health Sciences and Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA.
PLoS Comput Biol. 2013;9(7):e1003153. doi: 10.1371/journal.pcbi.1003153. Epub 2013 Jul 18.
Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.
现代 DNA 测序技术使遗传学家能够快速识别许多人类基因组中的遗传变异。然而,分离导致疾病的少数变体仍然是医学遗传学的一个重要但艰巨的挑战。我们开发了 GEMINI(基因组挖掘),这是一个用于探索所有人类遗传变异形式的灵活软件包。与现有工具不同,GEMINI 将遗传变异与多样化和适应性强的基因组注释集(例如 dbSNP、ENCODE、UCSC、ClinVar、KEGG)集成到一个统一的数据库中,以促进解释和数据探索。其他方法提供了一组不灵活的变体筛选器或优先级方法,而 GEMINI 允许研究人员根据样本基因型、遗传模式以及预安装和自定义基因组注释来组合复杂的查询。GEMINI 还提供了用于临时查询和数据探索的方法、用于利用基础数据库进行自定义分析的简单编程接口,以及用于常见分析的命令行和图形工具。我们展示了 GEMINI 在探索个人基因组和基于家族的遗传研究中的变异性方面的实用性,并说明了它能够扩展到涉及数千个人类样本的研究。GEMINI 旨在实现可重复性和灵活性,我们的目标是为研究人员提供医学基因组学的标准框架。