Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
Bioinformatics. 2011 Jan 15;27(2):281-3. doi: 10.1093/bioinformatics/btq643. Epub 2010 Dec 5.
The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses.
RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.
功能基因组学的下一代测序技术的出现产生了大量的序列信息,这些信息通常非常庞大,难以处理。此外,来自特定个体的序列读取包含足够的信息,可能会识别和遗传表征那个人,引起隐私问题。为了解决这些问题,我们开发了映射读取格式 (MRF),这是一种用于短读和长读对齐的紧凑数据摘要格式,能够对机密序列信息进行匿名化,同时允许人们仍然进行许多功能基因组学研究。我们开发了一套工具 (RSEQtools),用于分析 RNA-Seq 实验。这些工具包括一组模块,用于执行常见任务,如计算基因表达值、生成映射读取的信号轨迹并将该信号分割为活跃转录区域。此外,这些工具可以轻松用于构建可定制的 RNA-Seq 工作流程。除了 MRF 提供的匿名化之外,该格式还便于将读取的对齐与下游分析解耦。
RSEQtools 是用 C 语言实现的,源代码可在 http://rseqtools.gersteinlab.org/ 获得。