Suppr超能文献

无需依赖模拟读取的融合基因检测工具的计算验证。

ArtiFuse-computational validation of fusion gene detection tools without relying on simulated reads.

机构信息

TRON - Translational Oncology at the University Medical Center of Johannes Gutenberg University Mainz gGmbH, Mainz 55131, Germany.

出版信息

Bioinformatics. 2020 Jan 15;36(2):373-379. doi: 10.1093/bioinformatics/btz613.

Abstract

MOTIVATION

Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples.

RESULTS

Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset without the need for any simulated reads. We demonstrate our approach on eight RNA-seq datasets for three fusion gene prediction tools: average recall values peak for all three tools between 0.4 and 0.56 for high-quality and high-coverage datasets. As ArtiFuse affords total control over involved genes and breakpoint position, we also assessed performance with regard to gene-related properties, showing a drop-in recall value for low-expressed genes in high-coverage samples and genes with co-expressed paralogues. Overall tool performance assessed from ArtiFusions is lower compared to previously reported estimates on simulated reads. Due to the use of real RNA-seq datasets, we believe that ArtiFuse provides a more realistic benchmark that can be used to develop more accurate fusion gene prediction tools for application in clinical settings.

AVAILABILITY AND IMPLEMENTATION

ArtiFuse is implemented in Python. The source code and documentation are available at https://github.com/TRON-Bioinformatics/ArtiFusion.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因融合是一类重要的转录变体,可影响癌症的发展,并可通过多种现有的工具从 RNA 测序 (RNA-seq) 数据中预测。然而,由于缺乏已知的阳性和阴性事件,特别是关于单个样本中的融合基因,这些工具的实际性能尚不清楚。通常使用模拟读取,但这些无法解释从真实样本生成的 RNA-seq 数据中的所有技术偏差。

结果

在这里,我们提出了一种新的方法 ArtiFuse,它通过对基因组参考进行序列修饰来模拟融合基因,因此可以应用于任何 RNA-seq 数据集,而无需任何模拟读取。我们在三个融合基因预测工具的八个 RNA-seq 数据集上展示了我们的方法:对于高质量和高覆盖率数据集,所有三个工具的平均召回值峰值均在 0.4 到 0.56 之间。由于 ArtiFuse 完全控制涉及的基因和断点位置,我们还评估了与基因相关的属性的性能,在高覆盖率样本和具有共表达的同源基因的低表达基因中,召回值下降。与之前在模拟读取上报告的估计值相比,总体工具性能评估值较低。由于使用了真实的 RNA-seq 数据集,我们认为 ArtiFuse 提供了一个更现实的基准,可以用于开发更准确的融合基因预测工具,应用于临床环境。

可用性和实现

ArtiFuse 是用 Python 实现的。源代码和文档可在 https://github.com/TRON-Bioinformatics/ArtiFusion 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验