Suppr超能文献

OutSingle:一种使用最优硬阈值检测和注射 RNA-Seq 计数数据中异常值的新方法,用于奇异值。

OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values.

机构信息

College of Science Engineering, Hamad Bin Khalifa University, Doha, Qatar.

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.

出版信息

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad142.

Abstract

MOTIVATION

Finding outliers in RNA-sequencing (RNA-Seq) gene expression (GE) can help in identifying genes that are aberrant and cause Mendelian disorders. Recently developed models for this task rely on modeling RNA-Seq GE data using the negative binomial distribution (NBD). However, some of those models either rely on procedures for inferring NBD's parameters in a nonbiased way that are computationally demanding and thus make confounder control challenging, while others rely on less computationally demanding but biased procedures and convoluted confounder control approaches that hinder interpretability.

RESULTS

In this article, we present OutSingle (Outlier detection using Singular Value Decomposition), an almost instantaneous way of detecting outliers in RNA-Seq GE data. It uses a simple log-normal approach for count modeling. For confounder control, it uses the recently discovered optimal hard threshold (OHT) method for noise detection, which itself is based on singular value decomposition (SVD). Due to its SVD/OHT utilization, OutSingle's model is straightforward to understand and interpret. We then show that our novel method, when used on RNA-Seq GE data with real biological outliers masked by confounders, outcompetes the previous state-of-the-art model based on an ad hoc denoising autoencoder. Additionally, OutSingle can be used to inject artificial outliers masked by confounders, which is difficult to achieve with previous approaches. We describe a way of using OutSingle for outlier injection and proceed to show how OutSingle outperforms its competition on 16 out of 18 datasets that were generated from three real datasets using OutSingle's injection procedure with different outlier types and magnitudes. Our methods are applicable to other types of similar problems involving finding outliers in matrices under the presence of confounders.

AVAILABILITY AND IMPLEMENTATION

The code for OutSingle is available at https://github.com/esalkovic/outsingle.

摘要

动机

在 RNA 测序(RNA-Seq)基因表达(GE)中发现异常值有助于识别异常基因,这些基因导致孟德尔疾病。最近为这项任务开发的模型依赖于使用负二项分布(NBD)对 RNA-Seq GE 数据进行建模。然而,其中一些模型要么依赖于非偏倚推断 NBD 参数的程序,这些程序计算量大,因此使混杂因素控制具有挑战性,而另一些模型则依赖于计算量较小但有偏差的程序和复杂的混杂因素控制方法,这阻碍了可解释性。

结果

在本文中,我们提出了 OutSingle(使用奇异值分解检测异常值),这是一种在 RNA-Seq GE 数据中检测异常值的几乎即时方法。它使用简单的对数正态方法进行计数建模。对于混杂因素控制,它使用最近发现的最佳硬阈值(OHT)噪声检测方法,该方法本身基于奇异值分解(SVD)。由于其 SVD/OHT 的利用,OutSingle 的模型易于理解和解释。然后,我们表明,当我们的新方法用于被混杂因素掩盖的具有真实生物学异常值的 RNA-Seq GE 数据时,它优于基于特定降噪自动编码器的先前最先进的模型。此外,OutSingle 可用于注入被混杂因素掩盖的人工异常值,这很难通过以前的方法来实现。我们描述了一种使用 OutSingle 进行异常值注入的方法,并继续展示在使用 OutSingle 的注入程序从三个真实数据集生成的 16 个数据集的 18 个数据集上,OutSingle 如何在不同的异常类型和幅度下优于其竞争对手。我们的方法适用于其他类型的类似问题,涉及在混杂因素存在下矩阵中寻找异常值。

可用性和实现

OutSingle 的代码可在 https://github.com/esalkovic/outsingle 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c070/10089674/dc054543af45/btad142f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验