Karolinska Institutet, Stockholm, Sweden.
J Biomed Inform. 2009 Dec;42(6):1029-34. doi: 10.1016/j.jbi.2009.07.005. Epub 2009 Jul 17.
The large amounts of data generated when high-throughput genotyping methods are used in large-scale epidemiological studies (>10,000 participants) present an enormous challenge to researchers in terms of structured data management. In order to face these challenges, a system has been designed and implemented where genotype data can be efficiently stored. Focus has been on enabling researchers to collaborate by sharing genotype data with each other in a secure and controlled way. Genotype data is available where individuals can be selected using phenotype information and access to specific SNPs can be controlled using user-defined filters. Further value has been added to the basic genotypic information by including extensive metadata. Performance testing of the system was carried out using both artificial and real-world genotype data and shows that the implementation handles large datasets with a linear increase in extraction time and that the retrieval performance is more than sufficient for near-future genotyping research.
当高通量基因分型方法在大规模流行病学研究(>10000 名参与者)中使用时,所产生的大量数据给研究人员在结构化数据管理方面带来了巨大的挑战。为了应对这些挑战,已经设计并实现了一个系统,以便有效地存储基因型数据。该系统的重点是通过以安全和受控的方式彼此共享基因型数据,使研究人员能够进行协作。个体可以使用表型信息进行选择,并且可以使用用户定义的过滤器控制特定 SNP 的访问,从而提供基因型数据。通过包含广泛的元数据,为基本的基因分型信息添加了更多价值。使用人工和真实世界的基因型数据对系统进行了性能测试,结果表明,该实现可以处理具有线性增加提取时间的大型数据集,并且检索性能对于未来的基因分型研究来说已经足够了。