Ye Yuting, Li Jingyi Jessica
Division of Biostatistics, University of California, Berkeley, 94720, Berkeley, CA, USA.
Department of Statistics, 8125 Math Sciences Bldg., University of California, Los Angeles, Los Angeles, 90095-1554, CA, USA.
BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):11. doi: 10.1186/s12864-015-2304-8.
The advent of next-generation RNA sequencing (RNA-seq) has greatly advanced transcriptomic studies, including system-wide identification and quantification of mRNA isoforms under various biological conditions. A number of computational methods have been developed to systematically identify mRNA isoforms in a high-throughput manner from RNA-seq data. However, a common drawback of these methods is that their identified mRNA isoforms contain a high percentage of false positives, especially for genes with complex splicing structures, e.g., many exons and exon junctions.
We have developed a preselection method called "Non-negative Matrix Factorization Preselection" (NMFP) which is designed to improve the accuracy of computational methods in identifying mRNA isoforms from RNA-seq data. We demonstrated through simulation and real data studies that NMFP can effectively shrink the search space of isoform candidates and increase the accuracy of two mainstream computational methods, Cufflinks and SLIDE, in their identification of mRNA isoforms.
NMFP is a useful tool to preselect mRNA isoform candidates for downstream isoform discovery methods. It can greatly reduce the number of isoform candidates while maintaining a good coverage of unknown true isoforms. Adding NMFP as an upstream step, computational methods are expected to achieve better accuracy in identifying mRNA isoforms from RNA-seq data.
新一代RNA测序(RNA-seq)技术的出现极大地推动了转录组学研究,包括在各种生物学条件下对mRNA异构体进行全系统的鉴定和定量分析。已经开发了许多计算方法,用于从RNA-seq数据中以高通量方式系统地鉴定mRNA异构体。然而,这些方法的一个共同缺点是,它们鉴定出的mRNA异构体中假阳性比例很高,特别是对于具有复杂剪接结构的基因,例如包含许多外显子和外显子连接的基因。
我们开发了一种称为“非负矩阵分解预选”(NMFP)的预选方法,旨在提高从RNA-seq数据中鉴定mRNA异构体的计算方法的准确性。通过模拟和实际数据研究,我们证明NMFP可以有效地缩小异构体候选物的搜索空间,并提高两种主流计算方法Cufflinks和SLIDE在鉴定mRNA异构体时的准确性。
NMFP是一种用于为下游异构体发现方法预选mRNA异构体候选物的有用工具。它可以大大减少异构体候选物的数量,同时保持对未知真实异构体的良好覆盖。将NMFP作为上游步骤,预计计算方法在从RNA-seq数据中鉴定mRNA异构体时将获得更高的准确性。