dbVOR：一个用于导入系谱、表型和基因型数据并导出选定子集的数据库系统。

dbVOR: a database system for importing pedigree, phenotype and genotype data and exporting selected subsets.

作者信息

Baron Robert V, Conley Yvette P, Gorin Michael B, Weeks Daniel E

机构信息

Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, PittsburghPennsylvania, 15261, USA.

Department of Health Promotion and Development, School of Nursing, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA.

出版信息

BMC Bioinformatics. 2015 Mar 18;16(1):91. doi: 10.1186/s12859-015-0505-4.

DOI:10.1186/s12859-015-0505-4

PMID:25887129

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4407391/

Abstract

BACKGROUND

When studying the genetics of a human trait, we typically have to manage both genome-wide and targeted genotype data. There can be overlap of both people and markers from different genotyping experiments; the overlap can introduce several kinds of problems. Most times the overlapping genotypes are the same, but sometimes they are different. Occasionally, the lab will return genotypes using a different allele labeling scheme (for example 1/2 vs A/C). Sometimes, the genotype for a person/marker index is unreliable or missing. Further, over time some markers are merged and bad samples are re-run under a different sample name. We need a consistent picture of the subset of data we have chosen to work with even though there might possibly be conflicting measurements from multiple data sources.

RESULTS

We have developed the dbVOR database, which is designed to hold data efficiently for both genome-wide and targeted experiments. The data are indexed for fast retrieval by person and marker. In addition, we store pedigree and phenotype data for our subjects. The dbVOR database allows us to select subsets of the data by several different criteria and to merge their results into a coherent and consistent whole. Data may be filtered by: family, person, trait value, markers, chromosomes, and chromosome ranges. The results can be presented in columnar, Mega2, or PLINK format.

CONCLUSIONS

dbVOR serves our needs well. It is freely available from https://watson.hgen.pitt.edu/register . Documentation for dbVOR can be found at https://watson.hgen.pitt.edu/register/docs/dbvor.html .

摘要

背景

在研究人类性状的遗传学过程中，我们通常需要处理全基因组和靶向基因型数据。不同基因分型实验中的人员和标记可能存在重叠；这种重叠会引发多种问题。大多数情况下，重叠的基因型是相同的，但有时也会不同。偶尔，实验室会使用不同的等位基因标记方案返回基因型（例如1/2 对比 A/C）。有时，某个人/标记索引的基因型不可靠或缺失。此外，随着时间推移，一些标记会被合并，不良样本会以不同的样本名称重新运行。尽管可能存在来自多个数据源的相互冲突的测量结果，但我们仍需要对所选用于研究的数据子集有一个一致的了解。

结果

我们开发了dbVOR数据库，其设计目的是高效存储全基因组和靶向实验的数据。数据按人员和标记进行索引以便快速检索。此外，我们为研究对象存储了系谱和表型数据。dbVOR数据库使我们能够根据多种不同标准选择数据子集，并将其结果合并成一个连贯且一致的整体。数据可以通过以下方式进行筛选：家族、人员、性状值、标记、染色体以及染色体范围。结果可以以柱状、Mega2或PLINK格式呈现。