Graduate Program in Structural & Computational Biology & Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030, USA.
BMC Genomics. 2012;13 Suppl 8(Suppl 8):S9. doi: 10.1186/1471-2164-13-S8-S9. Epub 2012 Dec 17.
RNA sequencing (RNA-seq) has become a major tool for biomedical research. A key step in analyzing RNA-seq data is to infer the origin of short reads in the source genome, and for this purpose, many read alignment/mapping software programs have been developed. Usually, the majority of mappable reads can be mapped to one unambiguous genomic location, and these reads are called unique reads. However, a considerable proportion of mappable reads can be aligned to more than one genomic location with the same or similar fidelities, and they are called "multireads". Allocating these multireads is challenging but critical for interpreting RNA-seq data. We recently developed a Bayesian stochastic model that allocates multireads more accurately than alternative methods (Ji et al. Biometrics 2011).
In order to serve a greater biological community, we have implemented this method in a stand-alone, efficient, and user-friendly software package, BM-Map. BM-Map takes SAM (Sequence Alignment/Map), the most popular read alignment format, as the standard input; then based on the Bayesian model, it calculates mapping probabilities of multireads for competing genomic loci; and BM-Map generates the output by adding mapping probabilities to the original SAM file so that users can easily perform downstream analyses. The program is available in three common operating systems, Linux, Mac and PC. Moreover, we have built a dedicated website, http://bioinformatics.mdanderson.org/main/BM-Map, which includes free downloads, detailed tutorials and illustration examples.
We have developed a stand-alone, efficient, and user-friendly software package for accurately allocating multireads, which is an important addition to our previous methodology paper. We believe that this bioinformatics tool will greatly help RNA-seq and related applications reach their full potential in life science research.
RNA 测序(RNA-seq)已成为生物医学研究的主要工具。分析 RNA-seq 数据的关键步骤是推断短读在源基因组中的来源,为此开发了许多读对齐/映射软件程序。通常,大多数可映射的读都可以映射到源基因组中的一个明确位置,这些读被称为唯一读。然而,相当一部分可映射的读可以以相同或相似的保真度映射到多个基因组位置,这些读被称为“多读”。分配这些多读对于解释 RNA-seq 数据具有挑战性但至关重要。我们最近开发了一种贝叶斯随机模型,该模型比替代方法更准确地分配多读(Ji 等人,Biometrics 2011)。
为了服务更大的生物界,我们在一个独立的、高效的、用户友好的软件包 BM-Map 中实现了这种方法。BM-Map 以 SAM(序列比对/映射)作为最流行的读对齐格式作为标准输入;然后基于贝叶斯模型,它计算多读对竞争基因组位点的映射概率;BM-Map 通过将映射概率添加到原始 SAM 文件中生成输出,以便用户可以轻松进行下游分析。该程序可在三个常见的操作系统(Linux、Mac 和 PC)上使用。此外,我们建立了一个专用网站,http://bioinformatics.mdanderson.org/main/BM-Map,其中包括免费下载、详细教程和说明示例。
我们开发了一个独立的、高效的、用户友好的软件包,用于准确分配多读,这是对我们之前的方法论文的重要补充。我们相信,这个生物信息学工具将极大地帮助 RNA-seq 和相关应用在生命科学研究中充分发挥其潜力。