• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

检测转录组测序数据中剪接事件的样本量估计。

Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data.

机构信息

Department for Anaesthesiology, Heinrich Heine University, 40225 Düsseldorf, Germany.

BMFZ, Heinrich Heine University, 40225 Düsseldorf, Germany.

出版信息

Int J Mol Sci. 2017 Sep 5;18(9):1900. doi: 10.3390/ijms18091900.

DOI:10.3390/ijms18091900
PMID:28872584
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5618549/
Abstract

Merging data from multiple samples is required to detect low expressed transcripts or splicing events that might be present only in a subset of samples. However, the exact number of required replicates enabling the detection of such rare events often remains a mystery but can be approached through probability theory. Here, we describe a probabilistic model, relating the number of observed events in a batch of samples with observation probabilities. Therein, samples appear as a heterogeneous collection of events, which are observed with some probability. The model is evaluated in a batch of 54 transcriptomes of human dermal fibroblast samples. The majority of putative splice-sites (alignment gap-sites) are detected in (almost) all samples or only sporadically, resulting in an U-shaped pattern for observation probabilities. The probabilistic model systematically underestimates event numbers due to a bias resulting from finite sampling. However, using an additional assumption, the probabilistic model can predict observed event numbers within a <10% deviation from the median. Single samples contain a considerable amount of uniquely observed putative splicing events (mean 7122 in alignments from TopHat alignments and 86,215 in alignments from STAR). We conclude that the probabilistic model provides an adequate description for observation of gap-sites in transcriptome data. Thus, the calculation of required sample sizes can be done by application of a simple binomial model to sporadically observed random events. Due to the large number of uniquely observed putative splice-sites and the known stochastic noise in the splicing machinery, it appears advisable to include observation of rare splicing events into analysis objectives. Therefore, it is beneficial to take scores for the validation of gap-sites into account.

摘要

合并来自多个样本的数据是检测低表达转录本或剪接事件所必需的,这些事件可能只存在于一部分样本中。然而,能够检测到这些罕见事件的确切重复次数通常仍然是个谜,但可以通过概率论来接近。在这里,我们描述了一种概率模型,将一批样本中观察到的事件数量与观察概率联系起来。在该模型中,样本被视为事件的异质集合,这些事件以一定的概率被观察到。该模型在 54 个人类真皮成纤维细胞样本的转录组中进行了评估。大多数假定的剪接位点(比对缺口位点)在(几乎)所有样本中或仅偶尔被检测到,导致观察概率呈 U 形。由于有限采样导致的偏差,概率模型系统地低估了事件数量。然而,通过使用额外的假设,概率模型可以预测观察到的事件数量,其与中位数的偏差小于 10%。单个样本包含相当数量的独特观察到的假定剪接事件(来自 TopHat 比对的比对中的平均值为 7122 个,来自 STAR 的比对中的平均值为 86215 个)。我们得出结论,概率模型为观察转录组数据中的缺口位点提供了一个充分的描述。因此,可以通过应用简单的二项式模型来计算偶尔观察到的随机事件所需的样本量。由于独特观察到的假定剪接事件的数量很大,以及剪接机制中已知的随机噪声,似乎有必要将稀有剪接事件的观察纳入分析目标。因此,考虑缺口位点的分数以验证其是有益的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/e52a14e3b24f/ijms-18-01900-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/744e4e606a9b/ijms-18-01900-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/d4732c91eb92/ijms-18-01900-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/1d1ae5ba3698/ijms-18-01900-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/0f803546ec52/ijms-18-01900-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/e52a14e3b24f/ijms-18-01900-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/744e4e606a9b/ijms-18-01900-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/d4732c91eb92/ijms-18-01900-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/1d1ae5ba3698/ijms-18-01900-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/0f803546ec52/ijms-18-01900-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5cc/5618549/e52a14e3b24f/ijms-18-01900-g003.jpg

相似文献

1
Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data.检测转录组测序数据中剪接事件的样本量估计。
Int J Mol Sci. 2017 Sep 5;18(9):1900. doi: 10.3390/ijms18091900.
2
Validation of Splicing Events in Transcriptome Sequencing Data.转录组测序数据中剪接事件的验证
Int J Mol Sci. 2017 May 23;18(6):1110. doi: 10.3390/ijms18061110.
3
Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.读取-分割-运行:一种利用RNA测序数据识别全基因组非经典剪接区域的改进型生物信息学流程。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7.
4
ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events.ASGAL:将 RNA-Seq 数据比对到剪接图谱中以检测新的可变剪接事件。
BMC Bioinformatics. 2018 Nov 20;19(1):444. doi: 10.1186/s12859-018-2436-3.
5
BRIE: transcriptome-wide splicing quantification in single cells.BRIE:单细胞转录组范围的剪接定量分析
Genome Biol. 2017 Jun 27;18(1):123. doi: 10.1186/s13059-017-1248-5.
6
Gene expression and splicing alterations analyzed by high throughput RNA sequencing of chronic lymphocytic leukemia specimens.通过慢性淋巴细胞白血病标本的高通量RNA测序分析基因表达和剪接改变。
BMC Cancer. 2015 Oct 16;15:714. doi: 10.1186/s12885-015-1708-9.
7
rbamtools: an R interface to samtools enabling fast accumulative tabulation of splicing events over multiple RNA-seq samples.rbamtools:一个用于samtools的R接口,可实现对多个RNA测序样本的剪接事件进行快速累积制表。
Bioinformatics. 2015 May 15;31(10):1663-4. doi: 10.1093/bioinformatics/btu846. Epub 2015 Jan 5.
8
CIDANE: comprehensive isoform discovery and abundance estimation.CIDANE:全面的异构体发现与丰度估计
Genome Biol. 2016 Jan 30;17:16. doi: 10.1186/s13059-015-0865-0.
9
Deep RNA sequencing reveals a high frequency of alternative splicing events in the fungus Trichoderma longibrachiatum.深度RNA测序揭示了长枝木霉中可变剪接事件的高频率。
BMC Genomics. 2015 Feb 6;16(1):54. doi: 10.1186/s12864-015-1251-8.
10
A probabilistic framework for aligning paired-end RNA-seq data.用于比对 RNA-seq 数据的概率框架。
Bioinformatics. 2010 Aug 15;26(16):1950-7. doi: 10.1093/bioinformatics/btq336. Epub 2010 Jun 23.

引用本文的文献

1
CRISPR activation enables high-fidelity reprogramming into human pluripotent stem cells.CRISPR 激活可实现高精度重编程为人多能干细胞。
Stem Cell Reports. 2022 Feb 8;17(2):413-426. doi: 10.1016/j.stemcr.2021.12.017. Epub 2022 Jan 20.
2
Fingerprints of Modified RNA Bases from Deep Sequencing Profiles.从深度测序图谱中提取修饰 RNA 碱基的特征。
J Am Chem Soc. 2017 Nov 29;139(47):17074-17081. doi: 10.1021/jacs.7b07914. Epub 2017 Nov 17.

本文引用的文献

1
Validation of Splicing Events in Transcriptome Sequencing Data.转录组测序数据中剪接事件的验证
Int J Mol Sci. 2017 May 23;18(6):1110. doi: 10.3390/ijms18061110.
2
Age, gender and UV-exposition related effects on gene expression in in vivo aged short term cultivated human dermal fibroblasts.年龄、性别和紫外线暴露对体内老化的短期培养人皮肤成纤维细胞基因表达的相关影响。
PLoS One. 2017 May 5;12(5):e0175657. doi: 10.1371/journal.pone.0175657. eCollection 2017.
3
DNA Sequencing Sensors: An Overview.DNA 测序传感器:概述。
Sensors (Basel). 2017 Mar 14;17(3):588. doi: 10.3390/s17030588.
4
Alternative Splicing May Not Be the Key to Proteome Complexity.可变剪接可能并非蛋白质组复杂性的关键所在。
Trends Biochem Sci. 2017 Feb;42(2):98-110. doi: 10.1016/j.tibs.2016.08.008. Epub 2016 Oct 3.
5
Spliced synthetic genes as internal controls in RNA sequencing experiments.拼接合成基因作为 RNA 测序实验中的内参。
Nat Methods. 2016 Sep;13(9):792-8. doi: 10.1038/nmeth.3958. Epub 2016 Aug 8.
6
The Meaning of NMD: Translate or Perish.NMD 的含义:要么翻译,要么灭亡。
Trends Genet. 2016 Jul;32(7):395-407. doi: 10.1016/j.tig.2016.04.007. Epub 2016 May 14.
7
Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes.无义介导的 mRNA 降解:一种塑造转录组的复杂机制。
Nat Rev Mol Cell Biol. 2015 Nov;16(11):665-77. doi: 10.1038/nrm4063. Epub 2015 Sep 23.
8
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.通过深度学习预测 DNA 和 RNA 结合蛋白的序列特异性。
Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.
9
Most highly expressed protein-coding genes have a single dominant isoform.大多数高表达的蛋白质编码基因都有一种单一的主要异构体。
J Proteome Res. 2015 Apr 3;14(4):1880-7. doi: 10.1021/pr501286b. Epub 2015 Mar 11.
10
The evolution of nanopore sequencing.纳米孔测序的发展历程。
Front Genet. 2015 Jan 7;5:449. doi: 10.3389/fgene.2014.00449. eCollection 2014.