• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于下一代测序数据的多基因/变异关联的分层概率模型。

Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data.

机构信息

The Nuffield Division of Clinical Laboratory Sciences.

The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK.

出版信息

Bioinformatics. 2017 Oct 1;33(19):3058-3064. doi: 10.1093/bioinformatics/btx355.

DOI:10.1093/bioinformatics/btx355
PMID:28575251
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5637939/
Abstract

MOTIVATION

The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability.

RESULTS

We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping.

AVAILABILITY AND IMPLEMENTATION

All methods are implemented in the free software eQTLseq: https://github.com/dvav/eQTLseq.

CONTACT

dimitris.vavoulis@well.ox.ac.uk.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

鉴定影响基因表达的遗传变异(称为表达数量性状基因座或 eQTL)对于揭示复杂性状的遗传基础非常重要。在基于配对 DNA-seq 和 RNA-seq 检测的人群中同时检测多个 eQTL 采用了两种相互竞争的模型:依赖于 RNA-seq 数据的适当转换的模型(并且有成熟的数学理论支持),或基于计数的模型,其明确表示数字基因表达,从而使这种转换变得不必要。后者构成了一种非常流行的方法,但存在数学上的不可行性。

结果

我们开发了可行的基于计数的模型,通过引入潜在变量并在稀疏贝叶斯建模框架中适当地应用最近的统计理论,可以有效地估计这些模型。此外,我们研究了几种 RNA-seq 读取计数的转换方法,并介绍了反正切、逻辑和拉普拉斯平滑作为转换模型的预处理步骤。使用来自 1000 基因组和 gEUVADIS 项目的自然和精心模拟的数据,我们在各种场景下对这两种方法进行了基准测试,包括存在噪声和违反基本模型假设的情况。我们证明,反正切变换的拉普拉斯平滑数据至少与最先进的模型一样好,特别是在小样本的情况下。此外,我们表明过度分散的泊松模型与著名的负二项式相当,但更容易估计。这些结果为 eQTL 映射的基于转换的模型与基于计数的模型(特别是基于负二项式的模型)提供了强有力的支持。

可用性和实现

所有方法都在免费软件 eQTLseq 中实现:https://github.com/dvav/eQTLseq。

联系方式

dimitris.vavoulis@well.ox.ac.uk。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5637939/8c48575fd0f1/btx355f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5637939/9fa840d6e34f/btx355f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5637939/0e96d3e676ed/btx355f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5637939/8c48575fd0f1/btx355f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5637939/9fa840d6e34f/btx355f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5637939/0e96d3e676ed/btx355f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5637939/8c48575fd0f1/btx355f3.jpg

相似文献

1
Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data.基于下一代测序数据的多基因/变异关联的分层概率模型。
Bioinformatics. 2017 Oct 1;33(19):3058-3064. doi: 10.1093/bioinformatics/btx355.
2
Covariate-dependent negative binomial factor analysis of RNA sequencing data.基于协变量的 RNA 测序数据负二项式因子分析。
Bioinformatics. 2018 Jul 1;34(13):i61-i69. doi: 10.1093/bioinformatics/bty237.
3
eQTL mapping of rare variant associations using RNA-seq data: An evaluation of approaches.使用 RNA-seq 数据进行罕见变异关联的 eQTL 映射:方法评估。
PLoS One. 2019 Oct 3;14(10):e0223273. doi: 10.1371/journal.pone.0223273. eCollection 2019.
4
Exploring Bayesian Approaches to eQTL Mapping Through Probabilistic Programming.通过概率编程探索贝叶斯方法进行表达数量性状基因座定位
Methods Mol Biol. 2020;2082:123-146. doi: 10.1007/978-1-0716-0026-9_9.
5
ASElux: an ultra-fast and accurate allelic reads counter.ASElux:一种超快速且准确的等位基因读取计数器。
Bioinformatics. 2018 Apr 15;34(8):1313-1320. doi: 10.1093/bioinformatics/btx762.
6
Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments.基于 RNA-seq 实验的时空计数数据的非参数建模。
Bioinformatics. 2021 Nov 5;37(21):3788-3795. doi: 10.1093/bioinformatics/btab486.
7
Identifying differentially expressed transcripts from RNA-seq data with biological variation.从具有生物学变异的 RNA-seq 数据中鉴定差异表达的转录本。
Bioinformatics. 2012 Jul 1;28(13):1721-8. doi: 10.1093/bioinformatics/bts260. Epub 2012 May 3.
8
Bayesian modelling of high-throughput sequencing assays with malacoda.使用 Malacoda 对高通量测序检测进行贝叶斯建模。
PLoS Comput Biol. 2020 Jul 21;16(7):e1007504. doi: 10.1371/journal.pcbi.1007504. eCollection 2020 Jul.
9
A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.贝叶斯框架可以在基因表达水平中考虑复杂的非遗传因素,从而极大地提高了 eQTL 研究的功效。
PLoS Comput Biol. 2010 May 6;6(5):e1000770. doi: 10.1371/journal.pcbi.1000770.
10
Integrative analysis of multiple genomic variables using a hierarchical Bayesian model.基于分层贝叶斯模型的多个基因组变量的综合分析。
Bioinformatics. 2017 Oct 15;33(20):3220-3227. doi: 10.1093/bioinformatics/btx356.

引用本文的文献

1
A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data.利用纵向下一代测序数据跟踪癌症中克隆动态的统计方法。
Bioinformatics. 2021 Apr 19;37(2):147-154. doi: 10.1093/bioinformatics/btaa672.
2
A flexible and nearly optimal sequential testing approach to randomized testing: QUICK-STOP.一种灵活且近乎最优的随机测试序贯测试方法:QUICK-STOP。
Genet Epidemiol. 2020 Mar;44(2):139-147. doi: 10.1002/gepi.22268. Epub 2019 Nov 11.

本文引用的文献

1
Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies.表达数量性状基因座的全基因组综合分析有助于解释基因组关联研究。
Genome Biol. 2017 Jan 25;18(1):16. doi: 10.1186/s13059-016-1142-6.
2
The Ensembl Variant Effect Predictor.Ensembl变异效应预测器。
Genome Biol. 2016 Jun 6;17(1):122. doi: 10.1186/s13059-016-0974-4.
3
Fine-mapping cellular QTLs with RASQUAL and ATAC-seq.使用RASQUAL和ATAC-seq对细胞数量性状基因座进行精细定位。
Nat Genet. 2016 Feb;48(2):206-13. doi: 10.1038/ng.3467. Epub 2015 Dec 14.
4
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
5
DGEclust: differential expression analysis of clustered count data.DGEclust:聚类计数数据的差异表达分析
Genome Biol. 2015 Feb 20;16(1):39. doi: 10.1186/s13059-015-0604-6.
6
Gateways to the FANTOM5 promoter level mammalian expression atlas.通向FANTOM5启动子水平哺乳动物表达图谱的途径。
Genome Biol. 2015 Jan 5;16(1):22. doi: 10.1186/s13059-014-0560-6.
7
The role of regulatory variation in complex traits and disease.调控变异在复杂性状和疾病中的作用。
Nat Rev Genet. 2015 Apr;16(4):197-212. doi: 10.1038/nrg3891. Epub 2015 Feb 24.
8
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.
9
Graph-regularized dual Lasso for robust eQTL mapping.基于图正则化的双拉索方法在稳健的 eQTL 映射中的应用。
Bioinformatics. 2014 Jun 15;30(12):i139-48. doi: 10.1093/bioinformatics/btu293.
10
voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.voom:精确权重为RNA测序读数计数解锁线性模型分析工具。
Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.