高通量测序时间序列的高斯过程检验：在实验进化中的应用

Gaussian process test for high-throughput sequencing time series: application to experimental evolution.

作者信息

Topa Hande, Jónás Ágnes, Kofler Robert, Kosiol Carolin, Honkela Antti

机构信息

Helsinki Institute for Information Technology (HIIT), Department of Information and Computer Science, Aalto University, Espoo, Finland, Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Wien, Austria, Vienna Graduate School of Population Genetics, Wien, Austria and Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Helsinki, Finland Helsinki Institute for Information Technology (HIIT), Department of Information and Computer Science, Aalto University, Espoo, Finland, Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Wien, Austria, Vienna Graduate School of Population Genetics, Wien, Austria and Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Helsinki, Finland.

出版信息

Bioinformatics. 2015 Jun 1;31(11):1762-70. doi: 10.1093/bioinformatics/btv014. Epub 2015 Jan 21.

DOI:10.1093/bioinformatics/btv014

PMID:25614471

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4443671/

Abstract

MOTIVATION

Recent advances in high-throughput sequencing (HTS) have made it possible to monitor genomes in great detail. New experiments not only use HTS to measure genomic features at one time point but also monitor them changing over time with the aim of identifying significant changes in their abundance. In population genetics, for example, allele frequencies are monitored over time to detect significant frequency changes that indicate selection pressures. Previous attempts at analyzing data from HTS experiments have been limited as they could not simultaneously include data at intermediate time points, replicate experiments and sources of uncertainty specific to HTS such as sequencing depth.

RESULTS

We present the beta-binomial Gaussian process model for ranking features with significant non-random variation in abundance over time. The features are assumed to represent proportions, such as proportion of an alternative allele in a population. We use the beta-binomial model to capture the uncertainty arising from finite sequencing depth and combine it with a Gaussian process model over the time series. In simulations that mimic the features of experimental evolution data, the proposed method clearly outperforms classical testing in average precision of finding selected alleles. We also present simulations exploring different experimental design choices and results on real data from Drosophila experimental evolution experiment in temperature adaptation.

AVAILABILITY AND IMPLEMENTATION

R software implementing the test is available at https://github.com/handetopa/BBGP.

摘要

动机

高通量测序（HTS）的最新进展使得详细监测基因组成为可能。新的实验不仅使用HTS在一个时间点测量基因组特征，还监测它们随时间的变化，目的是识别其丰度的显著变化。例如，在群体遗传学中，等位基因频率随时间被监测，以检测表明选择压力的显著频率变化。先前分析HTS实验数据的尝试受到限制，因为它们不能同时包含中间时间点的数据、重复实验以及HTS特有的不确定性来源，如测序深度。

结果

我们提出了贝塔 - 二项式高斯过程模型，用于对随时间具有显著非随机丰度变化的特征进行排名。这些特征被假定代表比例，例如群体中替代等位基因的比例。我们使用贝塔 - 二项式模型来捕捉由于有限测序深度产生的不确定性，并将其与时间序列上的高斯过程模型相结合。在模拟实验进化数据特征的模拟中，所提出的方法在找到选定等位基因的平均精度方面明显优于经典测试。我们还展示了探索不同实验设计选择的模拟以及来自果蝇温度适应实验进化实验的真实数据的结果。

可用性和实现方式

实现该测试的R软件可在https://github.com/handetopa/BBGP获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f07/4443671/e5ed51ba3a69/btv014f1p.jpg

相似文献

Gaussian process test for high-throughput sequencing time series: application to experimental evolution.高通量测序时间序列的高斯过程检验：在实验进化中的应用

Bioinformatics. 2015 Jun 1;31(11):1762-70. doi: 10.1093/bioinformatics/btv014. Epub 2015 Jan 21.

GPrank: an R package for detecting dynamic elements from genome-wide time series.GPrank：一个用于从全基因组时间序列中检测动态元件的 R 包。

BMC Bioinformatics. 2018 Oct 4;19(1):367. doi: 10.1186/s12859-018-2370-4.

Estimates of allele-specific expression in Drosophila with a single genome sequence and RNA-seq data.使用单个基因组序列和 RNA-seq 数据估计果蝇中的等位基因特异性表达。

Bioinformatics. 2014 Sep 15;30(18):2603-10. doi: 10.1093/bioinformatics/btu342. Epub 2014 May 19.

AmpUMI: design and analysis of unique molecular identifiers for deep amplicon sequencing.AmpUMI：用于深度扩增子测序的独特分子标识符的设计与分析。

Bioinformatics. 2018 Jul 1;34(13):i202-i210. doi: 10.1093/bioinformatics/bty264.

Statistical modeling of coverage in high-throughput data.高通量数据覆盖度的统计建模

Methods Mol Biol. 2013;1038:61-79. doi: 10.1007/978-1-62703-514-9_4.

Clonal reconstruction from time course genomic sequencing data.从时间序列基因组测序数据中进行克隆重建。

BMC Genomics. 2019 Dec 30;20(Suppl 12):1002. doi: 10.1186/s12864-019-6328-3.

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.评估低深度简化基因组测序（GBS）数据的插补算法

PLoS One. 2016 Aug 18;11(8):e0160733. doi: 10.1371/journal.pone.0160733. eCollection 2016.

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution.通过实验进化中时间序列等位基因频率变化估算有效种群大小

Genetics. 2016 Oct;204(2):723-735. doi: 10.1534/genetics.116.191197. Epub 2016 Aug 19.

SNP calling using genotype model selection on high-throughput sequencing data.基于高通量测序数据的基因型模型选择进行 SNP calling。

Bioinformatics. 2012 Mar 1;28(5):643-50. doi: 10.1093/bioinformatics/bts001. Epub 2012 Jan 16.

Haplotype phasing in single-cell DNA-sequencing data.单细胞 DNA 测序数据中的单倍型相位。

Bioinformatics. 2018 Jul 1;34(13):i211-i217. doi: 10.1093/bioinformatics/bty286.

引用本文的文献

Bait-ER: A Bayesian method to detect targets of selection in Evolve-and-Resequence experiments.诱饵实验序列分析（Bait-ER）：一种用于检测进化与重测序实验中选择靶标的贝叶斯方法。

J Evol Biol. 2023 Jan;36(1):29-44. doi: 10.1111/jeb.14134. Epub 2022 Dec 21.

Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics.空间蛋白质组学的半监督非参数贝叶斯建模

Ann Appl Stat. 2022 Dec 1;16(4). doi: 10.1214/22-AOAS1603.

Inferring Epistasis from Genetic Time-series Data.从遗传时间序列数据推断上位性。

Mol Biol Evol. 2022 Oct 7;39(10). doi: 10.1093/molbev/msac199.

Benchmarking software tools for detecting and quantifying selection in evolve and resequencing studies.用于检测和量化进化和重测序研究中选择的软件工具的基准测试。

Genome Biol. 2019 Aug 15;20(1):169. doi: 10.1186/s13059-019-1770-8.

Optimizing the Power to Identify the Genetic Basis of Complex Traits with Evolve and Resequence Studies.通过进化和重测序研究优化识别复杂性状遗传基础的能力。

Mol Biol Evol. 2019 Dec 1;36(12):2890-2905. doi: 10.1093/molbev/msz183.

Inferring population genetics parameters of evolving viruses using time-series data.利用时间序列数据推断进化病毒的群体遗传学参数。

Virus Evol. 2019 Jun 8;5(1):vez011. doi: 10.1093/ve/vez011. eCollection 2019 Jan.

Seasonal Variation in Genome-Wide DNA Methylation Patterns and the Onset of Seasonal Timing of Reproduction in Great Tits.基因组范围 DNA 甲基化模式的季节性变化与大山雀繁殖季节性时间的开始。

Genome Biol Evol. 2019 Mar 1;11(3):970-983. doi: 10.1093/gbe/evz044.

GPrank: an R package for detecting dynamic elements from genome-wide time series.GPrank：一个用于从全基因组时间序列中检测动态元件的 R 包。

BMC Bioinformatics. 2018 Oct 4;19(1):367. doi: 10.1186/s12859-018-2370-4.

MimicrEE2: Genome-wide forward simulations of Evolve and Resequencing studies.MimicrEE2：全基因组正向模拟进化和重测序研究。

PLoS Comput Biol. 2018 Aug 16;14(8):e1006413. doi: 10.1371/journal.pcbi.1006413. eCollection 2018 Aug.

Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model.从具有时滞确定性模型的时间分辨序列数据推断适合度效应。

Genetics. 2018 May;209(1):255-264. doi: 10.1534/genetics.118.300790. Epub 2018 Mar 2.

本文引用的文献

The power to detect quantitative trait loci using resequenced, experimentally evolved populations of diploid, sexual organisms.利用经过重测序的二倍体有性生物实验进化群体来检测数量性状基因座的能力。

Mol Biol Evol. 2014 Apr;31(4):1040-55. doi: 10.1093/molbev/msu048. Epub 2014 Jan 18.

A guide for the design of evolve and resequencing studies.进化和重测序研究设计指南。

Mol Biol Evol. 2014 Feb;31(2):474-83. doi: 10.1093/molbev/mst221. Epub 2013 Nov 9.

Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments.在热环境和冷环境中进行的实验进化过程中，黑腹果蝇种群中出现了大规模的栖息地特异性基因组反应。

Mol Biol Evol. 2014 Feb;31(2):364-75. doi: 10.1093/molbev/mst205. Epub 2013 Oct 22.

Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.基于层次贝叶斯模型的基因表达时间序列在不规则采样重复和聚类中的分析。

BMC Bioinformatics. 2013 Aug 20;14:252. doi: 10.1186/1471-2105-14-252.

Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations.四十个进化中的酵母群体中普遍存在的遗传搭便车和克隆干扰。

Nature. 2013 Aug 29;500(7464):571-4. doi: 10.1038/nature12344. Epub 2013 Jul 21.

Sorad: a systems biology approach to predict and modulate dynamic signaling pathway response from phosphoproteome time-course measurements.Sorad：一种基于系统生物学的方法，用于预测和调节磷酸化蛋白质组时程测量的动态信号通路反应。

Bioinformatics. 2013 May 15;29(10):1283-91. doi: 10.1093/bioinformatics/btt130. Epub 2013 Mar 16.

Gaussian process-based Bayesian nonparametric inference of population size trajectories from gene genealogies.基于高斯过程的从基因谱系推断种群大小轨迹的贝叶斯非参数推断

Biometrics. 2013 Mar;69(1):8-18. doi: 10.1111/biom.12003. Epub 2013 Feb 14.

What paths do advantageous alleles take during short-term evolutionary change?有利等位基因在短期进化变化中会走哪些途径？

Mol Ecol. 2012 Oct;21(20):4913-6. doi: 10.1111/j.1365-294x.2012.05745.x.

Evolutionary inference for function-valued traits: Gaussian process regression on phylogenies.功能性状的进化推断：基于系统发育的高斯过程回归。

J R Soc Interface. 2013 Jan 6;10(78):20120616. doi: 10.1098/rsif.2012.0616.

Experimental evolution.实验进化。

Trends Ecol Evol. 2012 Oct;27(10):547-60. doi: 10.1016/j.tree.2012.06.001. Epub 2012 Jul 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

高通量测序时间序列的高斯过程检验：在实验进化中的应用

Gaussian process test for high-throughput sequencing time series: application to experimental evolution.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献