Kumar Shailesh, Vo Angie Duy, Qin Fujun, Li Hui
Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA 22908.
Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908.
Sci Rep. 2016 Feb 10;6:21597. doi: 10.1038/srep21597.
RNA-Seq made possible the global identification of fusion transcripts, i.e. "chimeric RNAs". Even though various software packages have been developed to serve this purpose, they behave differently in different datasets provided by different developers. It is important for both users, and developers to have an unbiased assessment of the performance of existing fusion detection tools. Toward this goal, we compared the performance of 12 well-known fusion detection software packages. We evaluated the sensitivity, false discovery rate, computing time, and memory usage of these tools in four different datasets (positive, negative, mixed, and test). We conclude that some tools are better than others in terms of sensitivity, positive prediction value, time consumption and memory usage. We also observed small overlaps of the fusions detected by different tools in the real dataset (test dataset). This could be due to false discoveries by various tools, but could also be due to the reason that none of the tools are inclusive. We have found that the performance of the tools depends on the quality, read length, and number of reads of the RNA-Seq data. We recommend that users choose the proper tools for their purpose based on the properties of their RNA-Seq data.
RNA测序使得对融合转录本(即“嵌合RNA”)进行全面鉴定成为可能。尽管已经开发了各种软件包来实现这一目的,但它们在不同开发者提供的不同数据集中表现各异。对于用户和开发者而言,对现有融合检测工具的性能进行公正评估都很重要。为实现这一目标,我们比较了12个知名融合检测软件包的性能。我们在四个不同数据集(阳性、阴性、混合和测试)中评估了这些工具的灵敏度、错误发现率、计算时间和内存使用情况。我们得出结论,在灵敏度、阳性预测值、时间消耗和内存使用方面,一些工具比其他工具表现更好。我们还观察到在真实数据集(测试数据集)中,不同工具检测到的融合存在少量重叠。这可能是由于各种工具的错误发现,但也可能是因为没有一个工具是包罗万象的。我们发现工具的性能取决于RNA测序数据的质量、读长和读数数量。我们建议用户根据其RNA测序数据的特性为自己的目的选择合适的工具。