• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于高通量测序或蛋白质组学实验中特征子集相关计数数据的模拟框架。

A simulation framework for correlated count data of features subsets in high-throughput sequencing or proteomics experiments.

作者信息

Kruppa Jochen, Kramer Frank, Beißbarth Tim, Jung Klaus

出版信息

Stat Appl Genet Mol Biol. 2016 Oct 1;15(5):401-414. doi: 10.1515/sagmb-2015-0082.

DOI:10.1515/sagmb-2015-0082
PMID:27655448
Abstract

As part of the data processing of high-throughput-sequencing experiments count data are produced representing the amount of reads that map to specific genomic regions. Count data also arise in mass spectrometric experiments for the detection of protein-protein interactions. For evaluating new computational methods for the analysis of sequencing count data or spectral count data from proteomics experiments artificial count data is thus required. Although, some methods for the generation of artificial sequencing count data have been proposed, all of them simulate single sequencing runs, omitting thus the correlation structure between the individual genomic features, or they are limited to specific structures. We propose to draw correlated data from the multivariate normal distribution and round these continuous data in order to obtain discrete counts. In our approach, the required distribution parameters can either be constructed in different ways or estimated from real count data. Because rounding affects the correlation structure we evaluate the use of shrinkage estimators that have already been used in the context of artificial expression data from DNA microarrays. Our approach turned out to be useful for the simulation of counts for defined subsets of features such as individual pathways or GO categories.

摘要

作为高通量测序实验数据处理的一部分,会产生计数数据,这些数据代表映射到特定基因组区域的 reads 数量。计数数据也出现在用于检测蛋白质 - 蛋白质相互作用的质谱实验中。因此,为了评估用于分析来自蛋白质组学实验的测序计数数据或光谱计数数据的新计算方法,需要人工计数数据。尽管已经提出了一些生成人工测序计数数据的方法,但所有这些方法都模拟单个测序运行,从而忽略了各个基因组特征之间的相关结构,或者它们仅限于特定结构。我们建议从多元正态分布中抽取相关数据,并对这些连续数据进行四舍五入以获得离散计数。在我们的方法中,所需的分布参数可以通过不同方式构建,也可以从实际计数数据中估计。由于四舍五入影响相关结构,我们评估了在 DNA 微阵列的人工表达数据背景下已经使用的收缩估计器的使用。我们的方法被证明对于模拟定义的特征子集(如单个途径或 GO 类别)的计数很有用。

相似文献

1
A simulation framework for correlated count data of features subsets in high-throughput sequencing or proteomics experiments.一种用于高通量测序或蛋白质组学实验中特征子集相关计数数据的模拟框架。
Stat Appl Genet Mol Biol. 2016 Oct 1;15(5):401-414. doi: 10.1515/sagmb-2015-0082.
2
Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.临床神经科学中的功能基因组学和蛋白质组学:数据挖掘与生物信息学
Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5.
3
An analytical workflow for accurate variant discovery in highly divergent regions.一种用于在高度分化区域进行准确变异发现的分析流程。
BMC Genomics. 2016 Sep 2;17(1):703. doi: 10.1186/s12864-016-3045-z.
4
Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2.使用EXCAVATOR2从全外显子组测序数据中增强拷贝数变异检测。
Nucleic Acids Res. 2016 Nov 16;44(20):e154. doi: 10.1093/nar/gkw695. Epub 2016 Aug 9.
5
RRBS-analyser: a comprehensive web server for reduced representation bisulfite sequencing data analysis.RRBS-analyser:一个用于简化代表性亚硫酸氢盐测序数据分析的综合网络服务器。
Hum Mutat. 2013 Dec;34(12):1606-10. doi: 10.1002/humu.22444. Epub 2013 Oct 10.
6
Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models.根据经验性突变和测序模型模拟下一代测序数据集。
PLoS One. 2016 Nov 28;11(11):e0167047. doi: 10.1371/journal.pone.0167047. eCollection 2016.
7
A multi-model statistical approach for proteomic spectral count quantitation.一种用于蛋白质组学光谱计数定量的多模型统计方法。
J Proteomics. 2016 Jul 20;144:23-32. doi: 10.1016/j.jprot.2016.05.032. Epub 2016 May 31.
8
Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.质体:下一代测序和基因组学数据的核苷酸分辨率分析
BMC Genomics. 2016 Nov 22;17(1):958. doi: 10.1186/s12864-016-3278-x.
9
A survey of copy-number variation detection tools based on high-throughput sequencing data.基于高通量测序数据的拷贝数变异检测工具综述。
Curr Protoc Hum Genet. 2012 Oct;Chapter 7:Unit7.19. doi: 10.1002/0471142905.hg0719s75.
10
Big Data in Plant Science: Resources and Data Mining Tools for Plant Genomics and Proteomics.植物科学中的大数据:植物基因组学和蛋白质组学的资源与数据挖掘工具
Methods Mol Biol. 2016;1415:533-47. doi: 10.1007/978-1-4939-3572-7_27.

引用本文的文献

1
Information-incorporated gene network construction with FDR control.基于 FDR 控制的包含信息的基因网络构建
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae125.
2
Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin.转录组数据的增强可改善病毒性呼吸道疾病患者的分类。
Int J Mol Sci. 2022 Feb 24;23(5):2481. doi: 10.3390/ijms23052481.