• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RNA测序广义线性模型的概率异常值识别

Probabilistic outlier identification for RNA sequencing generalized linear models.

作者信息

Mangiola Stefano, Thomas Evan A, Modrák Martin, Vehtari Aki, Papenfuss Anthony T

机构信息

The Walter and Eliza Hall Institute, Parkville, Victoria, 3052, Australia.

Institute of Microbiology of the Czech Academy of Sciences, Prague, 1083, Czech Republic.

出版信息

NAR Genom Bioinform. 2021 Mar 1;3(1):lqab005. doi: 10.1093/nargab/lqab005. eCollection 2021 Mar.

DOI:10.1093/nargab/lqab005
PMID:33709073
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7936652/
Abstract

Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not robust to extreme outliers, which we found to be abundant in public datasets. So far, no rigorous and probabilistic methods for detection of outliers have been developed for RNA sequencing data, leaving the identification mostly to visual inspection. Recent advances in Bayesian computation allow large-scale comparison of observed data against its theoretical distribution given in a statistical model. Here we propose ppcseq, a key quality-control tool for identifying transcripts that include outlier data points in differential expression analysis, which do not follow a negative binomial distribution. Applying ppcseq to analyse several publicly available datasets using popular tools, we show that from 3 to 10 percent of differentially abundant transcripts across algorithms and datasets had statistics inflated by the presence of outliers.

摘要

相对转录本丰度已被证明是了解生物系统中基因功能的一个有价值的工具。对于使用RNA测序数据进行转录本丰度的差异分析,负二项式模型是目前最常采用的。然而,基于负二项式模型的常用方法对极端异常值并不稳健,我们发现公共数据集中存在大量此类异常值。到目前为止,尚未开发出用于RNA测序数据的严格且概率性的异常值检测方法,异常值的识别大多依靠目视检查。贝叶斯计算的最新进展允许将观测数据与其在统计模型中给出的理论分布进行大规模比较。在此,我们提出了ppcseq,这是一种关键的质量控制工具,用于识别在差异表达分析中包含不符合负二项分布的异常数据点的转录本。使用流行工具将ppcseq应用于分析几个公开可用的数据集,我们发现,在不同算法和数据集中,有3%至10%的差异丰富转录本的统计量因异常值的存在而被夸大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/9e67f18a8e83/lqab005fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/3fb6e7247cab/lqab005fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/c2f44f36505e/lqab005fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/d5b9399b9554/lqab005fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/340ff85572c3/lqab005fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/9e67f18a8e83/lqab005fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/3fb6e7247cab/lqab005fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/c2f44f36505e/lqab005fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/d5b9399b9554/lqab005fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/340ff85572c3/lqab005fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f38/7936652/9e67f18a8e83/lqab005fig5.jpg

相似文献

1
Probabilistic outlier identification for RNA sequencing generalized linear models.RNA测序广义线性模型的概率异常值识别
NAR Genom Bioinform. 2021 Mar 1;3(1):lqab005. doi: 10.1093/nargab/lqab005. eCollection 2021 Mar.
2
Robust identification of differentially expressed genes from RNA-seq data.从 RNA-seq 数据中稳健地识别差异表达基因。
Genomics. 2020 Mar;112(2):2000-2010. doi: 10.1016/j.ygeno.2019.11.012. Epub 2019 Nov 20.
3
STAR_outliers: a python package that separates univariate outliers from non-normal distributions.STAR异常值:一个用于从非正态分布中分离单变量异常值的Python包。
BioData Min. 2023 Sep 4;16(1):25. doi: 10.1186/s13040-023-00342-0.
4
An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.一种在微阵列数据分析中进行分类时识别错误标记样本的综合方法。
PLoS One. 2012;7(10):e46700. doi: 10.1371/journal.pone.0046700. Epub 2012 Oct 17.
5
An omnibus test for differential distribution analysis of microbiome sequencing data.一种用于微生物组测序数据差异分布分析的集成测试方法。
Bioinformatics. 2018 Feb 15;34(4):643-651. doi: 10.1093/bioinformatics/btx650.
6
OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values.OutSingle:一种使用最优硬阈值检测和注射 RNA-Seq 计数数据中异常值的新方法,用于奇异值。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad142.
7
Entropy-based grid approach for handling outliers: a case study to environmental monitoring data.基于熵的网格方法处理异常值:以环境监测数据为例。
Environ Sci Pollut Res Int. 2023 Dec;30(60):125138-125157. doi: 10.1007/s11356-023-26780-1. Epub 2023 Jun 12.
8
Outlier detection and rejection in scatterplots: Do outliers influence intuitive statistical judgments?散点图中的异常值检测与剔除:异常值是否会影响直观的统计判断?
J Exp Psychol Hum Percept Perform. 2023 Jan;49(1):129-144. doi: 10.1037/xhp0001065. Epub 2022 Nov 17.
9
OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data.奥特赖德:一种在 RNA 测序数据中检测异常表达基因的统计方法。
Am J Hum Genet. 2018 Dec 6;103(6):907-917. doi: 10.1016/j.ajhg.2018.10.025. Epub 2018 Nov 29.
10
Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets.用于改进生物数据集多维尺度嵌入的正交异常值检测与维度估计
Front Bioinform. 2023 Aug 10;3:1211819. doi: 10.3389/fbinf.2023.1211819. eCollection 2023.

引用本文的文献

1
Patterns of extreme outlier gene expression suggest an edge of chaos effect in transcriptomic networks.极端离群基因表达模式表明转录组网络中存在混沌边缘效应。
Genome Biol. 2025 Sep 9;26(1):272. doi: 10.1186/s13059-025-03709-0.
2
Circulating immune cells exhibit distinct traits linked to metastatic burden in breast cancer.循环免疫细胞表现出与乳腺癌转移负担相关的独特特征。
Breast Cancer Res. 2025 May 8;27(1):73. doi: 10.1186/s13058-025-01982-2.
3
cellsig plug-in enhances CIBERSORTx signature selection for multidataset transcriptomes with sparse multilevel modelling.

本文引用的文献

1
Stan: A Probabilistic Programming Language.斯坦:一种概率编程语言。
J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.
2
Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models.使用负二项式模型族对RNA测序数据进行贝叶斯分析。
Bayesian Anal. 2018 Jun;13(2):411-436. doi: 10.1214/17-BA1055. Epub 2017 Apr 8.
3
tidybulk: an R tidy framework for modular transcriptomic data analysis.tidybulk:一个用于模块化转录组数据分析的 R tidy 框架。
cellsig 插件通过稀疏多级建模增强了 CIBERSORTx 签名在多数据集转录组中的选择。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad685.
4
sccomp: Robust differential composition and variability analysis for single-cell data.sccomp:用于单细胞数据的稳健差异成分和变异性分析。
Proc Natl Acad Sci U S A. 2023 Aug 15;120(33):e2203828120. doi: 10.1073/pnas.2203828120. Epub 2023 Aug 7.
5
Taurine deficiency as a driver of aging.牛磺酸缺乏是衰老的驱动因素。
Science. 2023 Jun 9;380(6649):eabn9257. doi: 10.1126/science.abn9257.
6
Gene filtering strategies for machine learning guided biomarker discovery using neonatal sepsis RNA-seq data.使用新生儿败血症RNA测序数据进行机器学习引导的生物标志物发现的基因筛选策略。
Front Genet. 2023 Apr 11;14:1158352. doi: 10.3389/fgene.2023.1158352. eCollection 2023.
Genome Biol. 2021 Jan 22;22(1):42. doi: 10.1186/s13059-020-02233-7.
4
Reduced lipolysis in lipoma phenocopies lipid accumulation in obesity.脂肪瘤表型中脂肪分解减少导致肥胖中的脂质堆积。
Int J Obes (Lond). 2021 Mar;45(3):565-576. doi: 10.1038/s41366-020-00716-y. Epub 2020 Nov 24.
5
Quantitative and Qualitative Perturbations of CD8 MAITs in Healthy -Infected Individuals.健康感染者中 CD8 MAIT 细胞的定量和定性扰动。
Immunohorizons. 2020 Jun 4;4(6):292-307. doi: 10.4049/immunohorizons.2000031.
6
Exercise-Induced Increases in Insulin Sensitivity After Bariatric Surgery Are Mediated By Muscle Extracellular Matrix Remodeling.减重手术后运动引起的胰岛素敏感性增加是通过肌肉细胞外基质重塑介导的。
Diabetes. 2020 Aug;69(8):1675-1691. doi: 10.2337/db19-1180. Epub 2020 May 14.
7
Landscape of the Noncoding Transcriptome Response of Two Arabidopsis Ecotypes to Phosphate Starvation.两种拟南芥生态型对磷酸盐饥饿的非编码转录组响应的景观。
Plant Physiol. 2020 Jul;183(3):1058-1072. doi: 10.1104/pp.20.00446. Epub 2020 May 13.
8
Negative binomial additive model for RNA-Seq data analysis.RNA-Seq 数据分析的负二项式加性模型。
BMC Bioinformatics. 2020 May 1;21(1):171. doi: 10.1186/s12859-020-3506-x.
9
A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data.一种用于转录组测序数据聚类的多元泊松-对数正态混合模型。
BMC Bioinformatics. 2019 Jul 16;20(1):394. doi: 10.1186/s12859-019-2916-0.
10
Androgen deprivation therapy promotes an obesity-like microenvironment in periprostatic fat.雄激素剥夺疗法会在前列腺周围脂肪中促进一种类似肥胖的微环境。
Endocr Connect. 2019 May 1;8(5):547-558. doi: 10.1530/EC-19-0029.