• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

困惑度:在缺乏真实对照的情况下评估转录本丰度估计

Perplexity: evaluating transcript abundance estimation in the absence of ground truth.

作者信息

Fan Jason, Chan Skylar, Patro Rob

机构信息

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA.

出版信息

Algorithms Mol Biol. 2022 Mar 25;17(1):6. doi: 10.1186/s13015-022-00214-y.

DOI:10.1186/s13015-022-00214-y
PMID:35331283
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8951746/
Abstract

BACKGROUND

There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best.

RESULTS

We derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. Furthermore, we demonstrate theoretically and experimentally that perplexity can be computed for arbitrary transcript abundance estimation models.

CONCLUSIONS

Alongside the derivation and implementation of perplexity for transcript abundance estimation, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.

摘要

背景

用于从RNA测序数据估计转录本丰度的概率模型和推理方法发展迅速。这些模型旨在准确估计转录本水平的丰度,考虑测量过程中的不同偏差,甚至评估所得估计值中的不确定性,这些不确定性可传播到后续分析中。此类方法推断出的估计值的假定准确性支撑着实验室中常规进行的基于基因表达的分析。尽管已知超参数选择会影响推断丰度的分布(例如产生平滑估计值与稀疏估计值),但在实验数据中进行模型选择的策略充其量只是非正式地得到解决。

结果

我们推导出用于直接评估片段集丰度估计值的困惑度。我们从用于评估语言和主题模型的类似度量中改编了困惑度,并扩展了该度量以仔细考虑RNA测序特有的极端情况。在实验数据中,困惑度最佳的估计值也与定量聚合酶链反应测量结果相关性最佳。在模拟数据中,困惑度表现良好,并且与针对真实情况的全基因组测量和差异表达分析一致。此外,我们在理论和实验上证明,可以为任意转录本丰度估计模型计算困惑度。

结论

除了推导和实现用于转录本丰度估计的困惑度之外,我们的研究首次使得在没有真实情况的情况下对实验数据进行转录本丰度估计的模型选择成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/f48cbfbac21b/13015_2022_214_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/85d89626a02e/13015_2022_214_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/7ab7f7b576b9/13015_2022_214_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/d560eb6ee7b7/13015_2022_214_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/64ff0e2a6f51/13015_2022_214_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/4df693850aec/13015_2022_214_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/0a681d16a8d4/13015_2022_214_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/f809b03cf4ce/13015_2022_214_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/8e7cee7f56bf/13015_2022_214_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/3f636b2129b8/13015_2022_214_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/315539bea4d3/13015_2022_214_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/5b38849dd368/13015_2022_214_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/50ad611d8fd9/13015_2022_214_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/770bed87e88f/13015_2022_214_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/2ee89ab716b7/13015_2022_214_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/8da4692d7dab/13015_2022_214_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/f48cbfbac21b/13015_2022_214_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/85d89626a02e/13015_2022_214_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/7ab7f7b576b9/13015_2022_214_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/d560eb6ee7b7/13015_2022_214_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/64ff0e2a6f51/13015_2022_214_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/4df693850aec/13015_2022_214_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/0a681d16a8d4/13015_2022_214_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/f809b03cf4ce/13015_2022_214_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/8e7cee7f56bf/13015_2022_214_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/3f636b2129b8/13015_2022_214_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/315539bea4d3/13015_2022_214_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/5b38849dd368/13015_2022_214_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/50ad611d8fd9/13015_2022_214_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/770bed87e88f/13015_2022_214_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/2ee89ab716b7/13015_2022_214_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/8da4692d7dab/13015_2022_214_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b34c/8951746/f48cbfbac21b/13015_2022_214_Fig16_HTML.jpg

相似文献

1
Perplexity: evaluating transcript abundance estimation in the absence of ground truth.困惑度:在缺乏真实对照的情况下评估转录本丰度估计
Algorithms Mol Biol. 2022 Mar 25;17(1):6. doi: 10.1186/s13015-022-00214-y.
2
Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.RNA测序的差异分析:转录本水平估计可改善基因水平推断。
F1000Res. 2015 Dec 30;4:1521. doi: 10.12688/f1000research.7563.2. eCollection 2015.
3
Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna:用于配置短读和长读基因组测序错误纠正工具的变压器架构。
BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.
4
A robust method for transcript quantification with RNA-seq data.一种利用RNA测序数据进行转录本定量的可靠方法。
J Comput Biol. 2013 Mar;20(3):167-87. doi: 10.1089/cmb.2012.0230.
5
Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data.从RNA测序数据计算推断转录本异构体丰度方法的比较评估
Genome Biol. 2015 Jul 23;16(1):150. doi: 10.1186/s13059-015-0702-5.
6
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.雅典娜:使用语言模型自动调整基于 k-mer 的基因组纠错算法。
Sci Rep. 2019 Nov 6;9(1):16157. doi: 10.1038/s41598-019-52196-4.
7
A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs.一种用于量化转录丰度估计和注释目录可靠性的连接覆盖兼容性得分。
Life Sci Alliance. 2019 Jan 17;2(1). doi: 10.26508/lsa.201800175. Print 2019 Feb.
8
Fast and accurate approximate inference of transcript expression from RNA-seq data.从RNA测序数据中快速准确地进行转录本表达的近似推断。
Bioinformatics. 2015 Dec 15;31(24):3881-9. doi: 10.1093/bioinformatics/btv483. Epub 2015 Aug 26.
9
Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data.Terminus 能够从 RNA-seq 数据中发现数据驱动的、稳健的转录组。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i102-i110. doi: 10.1093/bioinformatics/btaa448.
10
Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.RNA测序片段序列偏差的建模可减少转录本丰度估计中的系统误差。
Nat Biotechnol. 2016 Dec;34(12):1287-1291. doi: 10.1038/nbt.3682. Epub 2016 Sep 26.

本文引用的文献

1
Polee: RNA-Seq analysis using approximate likelihood.波利:使用近似似然法的RNA测序分析
NAR Genom Bioinform. 2021 May 25;3(2):lqab046. doi: 10.1093/nargab/lqab046. eCollection 2021 Jun.
2
Sustainable data analysis with Snakemake.使用 Snakemake 进行可持续数据分析。
F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.
3
A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification.贝叶斯框架用于细胞间信息共享可提高 dscRNA-seq 的定量分析。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i292-i299. doi: 10.1093/bioinformatics/btaa450.
4
Ensembl 2020.Ensembl 2020.
Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688. doi: 10.1093/nar/gkz966.
5
Advances and Challenges in Metatranscriptomic Analysis.宏转录组学分析的进展与挑战
Front Genet. 2019 Sep 25;10:904. doi: 10.3389/fgene.2019.00904. eCollection 2019.
6
rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data.rnaSPAdes:一种从头转录组组装程序及其在 RNA-Seq 数据中的应用。
Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz100.
7
Nonparametric expression analysis using inferential replicate counts.使用推断重复计数的非参数表达分析。
Nucleic Acids Res. 2019 Oct 10;47(18):e105. doi: 10.1093/nar/gkz622.
8
EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data.EmptyDrops:用于区分基于液滴的单细胞 RNA 测序数据中的细胞和空液滴。
Genome Biol. 2019 Mar 22;20(1):63. doi: 10.1186/s13059-019-1662-y.
9
RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification.RefSeq 数据库的增长影响了基于 k-mer 的最低共同祖先物种鉴定的准确性。
Genome Biol. 2018 Oct 30;19(1):165. doi: 10.1186/s13059-018-1554-6.
10
Improved data-driven likelihood factorizations for transcript abundance estimation.改进的基于数据的似然因子分解方法用于转录本丰度估计。
Bioinformatics. 2017 Jul 15;33(14):i142-i151. doi: 10.1093/bioinformatics/btx262.