SNPFile——一个用于大规模关联图谱绘制和群体遗传学研究的软件库及文件格式。

SNPFile--a software library and file format for large scale association mapping and population genetics studies.

作者信息

Nielsen Jesper, Mailund Thomas

机构信息

Bioinformatics Research Center, University of Aarhus, Denmark.

出版信息

BMC Bioinformatics. 2008 Dec 8;9:526. doi: 10.1186/1471-2105-9-526.

DOI:10.1186/1471-2105-9-526

PMID:19063732

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2633306/

Abstract

BACKGROUND

High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulate the data. While spreadsheets and at text files were adequate solutions earlier, the increased data size mandates more efficient solutions.

RESULTS

We describe a new binary file format for SNP data, together with a software library for file manipulation. The file format stores genotype data together with any kind of additional data, using a flexible serialisation mechanism. The format is designed to be IO efficient for the access patterns of most multi-locus analysis methods.

CONCLUSION

The new file format has been very useful for our own studies where it has significantly reduced the informatics burden in keeping track of various secondary data, and where the memory and IO efficiency has greatly simplified analysis runs. A main limitation with the file format is that it is only supported by the very limited set of analysis tools developed in our own lab. This is somewhat alleviated by a scripting interfaces that makes it easy to write converters to and from the format.

摘要

背景

高通量基因分型技术已能够以具有成本效益的方式，对数以千计的个体进行数十万标记的分型，用于全基因组研究。数据采集技术的这一巨大进步使其成为一项信息学挑战，即如何高效地存储和处理这些数据。虽然电子表格和文本文件在早期是足够的解决方案，但数据量的增加需要更高效的解决方案。

结果

我们描述了一种用于单核苷酸多态性（SNP）数据的新二进制文件格式，以及一个用于文件处理的软件库。该文件格式使用灵活的序列化机制，将基因型数据与任何类型的附加数据一起存储。该格式旨在针对大多数多位点分析方法的访问模式实现输入输出高效。

结论

这种新文件格式对我们自己的研究非常有用，它显著减轻了跟踪各种辅助数据时的信息学负担，并且内存和输入输出效率极大地简化了分析流程。该文件格式的一个主要限制是，只有我们自己实验室开发的非常有限的一组分析工具支持它。通过一个脚本接口，使得编写该格式的转换程序变得容易，这在一定程度上缓解了这一问题。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

SNPFile——一个用于大规模关联图谱绘制和群体遗传学研究的软件库及文件格式。

SNPFile--a software library and file format for large scale association mapping and population genetics studies.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

SNPFile——一个用于大规模关联图谱绘制和群体遗传学研究的软件库及文件格式。

SNPFile--a software library and file format for large scale association mapping and population genetics studies.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献