Suppr超能文献

奥特赖德:一种在 RNA 测序数据中检测异常表达基因的统计方法。

OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data.

机构信息

Department of Informatics, Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany.

Department of Informatics, Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany; Quantitative Biosciences Munich, Gene Center, Department of Biochemistry, Ludwig-Maximilians Universität München, Feodor-Lynen-Str. 25, 81377 München, Germany.

出版信息

Am J Hum Genet. 2018 Dec 6;103(6):907-917. doi: 10.1016/j.ajhg.2018.10.025. Epub 2018 Nov 29.

Abstract

RNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (Outlier in RNA-Seq Finder), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read-count expectations according to the gene covariation resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best recall of artificially corrupted data. Precision-recall analyses using simulated outlier read counts demonstrated the importance of controlling for covariation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a dataset, for identifying outlier samples with too many aberrantly expressed genes, and for detecting aberrant gene expression on the basis of false-discovery-rate-adjusted p values. Overall, OUTRIDER provides an end-to-end solution for identifying aberrantly expressed genes and is suitable for use by rare-disease diagnostic platforms.

摘要

RNA 测序(RNA-seq)作为基因组测序的补充检测方法,正在越来越受到关注,可用于精确识别罕见疾病的分子病因。一种强有力的方法是识别异常的基因表达水平,将其作为潜在的致病事件。然而,现有的 RNA-seq 数据中异常读取计数检测方法要么缺乏统计显著性评估,因此确定截止值是任意的,要么依赖于对混杂因素进行主观的手动校正。在这里,我们描述了 OUTRIDER(RNA-seq 中的异常值发现者),这是一种针对这些问题开发的算法。该算法使用自动编码器根据技术、环境或常见遗传变异导致的基因共变来构建读取计数预期模型。根据这些预期,RNA-seq 读取计数被假设为遵循具有基因特异性分散的负二项式分布。然后,将显著偏离该分布的读取计数识别为异常值。该模型会自动拟合,以实现对人工污染数据的最佳召回率。使用模拟异常读取计数进行的精度-召回分析表明,控制共变和基于显著性的阈值非常重要。OUTRIDER 是开源的,包括用于过滤未在数据集表达的基因、识别具有过多异常表达基因的异常样本以及基于错误发现率调整的 p 值检测异常基因表达的功能。总的来说,OUTRIDER 为识别异常表达基因提供了一个端到端的解决方案,适合罕见病诊断平台使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验