• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基于TCGA数据驱动的标识符过滤评估提高癌症基因表达数据质量

Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering.

作者信息

McDade Kevin K, Chandran Uma, Day Roger S

机构信息

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA. ; Department of Science, The Pennsylvania State University, Shenango Campus, Sharon, PA, USA.

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

Cancer Inform. 2015 Dec 16;14:149-61. doi: 10.4137/CIN.S33076. eCollection 2015.

DOI:10.4137/CIN.S33076
PMID:26715829
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4686346/
Abstract

Data quality is a recognized problem for high-throughput genomics platforms, as evinced by the proliferation of methods attempting to filter out lower quality data points. Different filtering methods lead to discordant results, raising the question, which methods are best? Astonishingly, little computational support is offered to analysts to decide which filtering methods are optimal for the research question at hand. To evaluate them, we begin with a pair of expression data sets, transcriptomic and proteomic, on the same samples. The pair of data sets form a test-bed for the evaluation. Identifier mapping between the data sets creates a collection of feature pairs, with correlations calculated for each pair. To evaluate a filtering strategy, we estimate posterior probabilities for the correctness of probesets accepted by the method. An analyst can set expected utilities that represent the trade-off between the quality and quantity of accepted features. We tested nine published probeset filtering methods and combination strategies. We used two test-beds from cancer studies providing transcriptomic and proteomic data. For reasonable utility settings, the Jetset filtering method was optimal for probeset filtering on both test-beds, even though both assay platforms were different. Further intersection with a second filtering method was indicated on one test-bed but not the other.

摘要

数据质量是高通量基因组学平台公认的问题,试图过滤掉低质量数据点的方法激增就证明了这一点。不同的过滤方法会导致不一致的结果,这就引出了一个问题:哪种方法是最好的?令人惊讶的是,几乎没有为分析师提供计算支持,以决定哪种过滤方法最适合手头的研究问题。为了评估这些方法,我们从同一组样本的一对表达数据集(转录组学和蛋白质组学)开始。这对数据集构成了评估的试验台。数据集之间的标识符映射创建了一组特征对,并为每对计算相关性。为了评估一种过滤策略,我们估计该方法接受的探针集正确性的后验概率。分析师可以设置预期效用,以表示接受特征的质量和数量之间的权衡。我们测试了九种已发表的探针集过滤方法和组合策略。我们使用了来自癌症研究的两个试验台,提供转录组学和蛋白质组学数据。对于合理的效用设置,即使两个检测平台不同,Jetset过滤方法在两个试验台上进行探针集过滤时都是最优的。在一个试验台上表明需要与第二种过滤方法进一步交叉,但在另一个试验台上则不需要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/2f7a50a99069/cin-14-2015-149f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/18f635348225/cin-14-2015-149f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/13e102e24b90/cin-14-2015-149f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/9f09b93ea46b/cin-14-2015-149f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/77e83ece26ea/cin-14-2015-149f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/2f7a50a99069/cin-14-2015-149f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/18f635348225/cin-14-2015-149f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/13e102e24b90/cin-14-2015-149f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/9f09b93ea46b/cin-14-2015-149f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/77e83ece26ea/cin-14-2015-149f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/4686346/2f7a50a99069/cin-14-2015-149f5.jpg

相似文献

1
Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering.通过基于TCGA数据驱动的标识符过滤评估提高癌症基因表达数据质量
Cancer Inform. 2015 Dec 16;14:149-61. doi: 10.4137/CIN.S33076. eCollection 2015.
2
A decision theory paradigm for evaluating identifier mapping and filtering methods using data integration.一种使用数据集成评估标识符映射和过滤方法的决策理论范式。
BMC Bioinformatics. 2013 Jul 15;14:223. doi: 10.1186/1471-2105-14-223.
3
Identifier mapping performance for integrating transcriptomics and proteomics experimental results.整合转录组学和蛋白质组学实验结果的标识符映射性能。
BMC Bioinformatics. 2011 May 27;12:213. doi: 10.1186/1471-2105-12-213.
4
Tests for differential gene expression using weights in oligonucleotide microarray experiments.在寡核苷酸微阵列实验中使用权重进行差异基因表达检测。
BMC Genomics. 2006 Feb 22;7:33. doi: 10.1186/1471-2164-7-33.
5
Analysis of discordant Affymetrix probesets casts serious doubt on idea of microarray data reutilization.对不一致的Affymetrix探针集的分析严重质疑了微阵列数据再利用的想法。
BMC Genomics. 2014;15 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2164-15-S12-S8. Epub 2014 Dec 19.
6
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.变异元调用器:用于基于定量、精确性筛选的变异调用流程的自动融合。
BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.
7
Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets.用于基因集功能解释和描述的数据驱动与专家驱动规则归纳及筛选框架
J Biomed Semantics. 2017 Jun 26;8(1):23. doi: 10.1186/s13326-017-0129-x.
8
Retrospective analysis: reproducibility of interblastomere differences of mRNA expression in 2-cell stage mouse embryos is remarkably poor due to combinatorial mechanisms of blastomere diversification.回顾性分析:由于囊胚细胞多样化的组合机制,2 细胞期小鼠胚胎中 mRNA 表达的卵裂球间差异的可重复性极差。
Mol Hum Reprod. 2018 Jul 1;24(7):388-400. doi: 10.1093/molehr/gay021.
9
Genetic test bed for feature selection.用于特征选择的基因测试平台。
Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.
10

本文引用的文献

1
A pan-cancer proteomic perspective on The Cancer Genome Atlas.基于癌症基因组图谱的泛癌蛋白质组学视角。
Nat Commun. 2014 May 29;5:3887. doi: 10.1038/ncomms4887.
2
A decision theory paradigm for evaluating identifier mapping and filtering methods using data integration.一种使用数据集成评估标识符映射和过滤方法的决策理论范式。
BMC Bioinformatics. 2013 Jul 15;14:223. doi: 10.1186/1471-2105-14-223.
3
Gene expression analysis of early stage endometrial cancers reveals unique transcripts associated with grade and histology but not depth of invasion.
早期子宫内膜癌的基因表达分析显示,与分级和组织学相关的独特转录本,但与浸润深度无关。
Front Oncol. 2013 Jun 17;3:139. doi: 10.3389/fonc.2013.00139. eCollection 2013.
4
The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.cBio 癌症基因组学门户:一个用于探索多维癌症基因组学数据的开放平台。
Cancer Discov. 2012 May;2(5):401-4. doi: 10.1158/2159-8290.CD-12-0095.
5
Jetset: selecting the optimal microarray probe set to represent a gene.微阵列探针集的选择:代表一个基因的最优微阵列探针集。
BMC Bioinformatics. 2011 Dec 15;12:474. doi: 10.1186/1471-2105-12-474.
6
Integrated genomic analyses of ovarian carcinoma.卵巢癌的综合基因组分析。
Nature. 2011 Jun 29;474(7353):609-15. doi: 10.1038/nature10166.
7
Identifier mapping performance for integrating transcriptomics and proteomics experimental results.整合转录组学和蛋白质组学实验结果的标识符映射性能。
BMC Bioinformatics. 2011 May 27;12:213. doi: 10.1186/1471-2105-12-213.
8
A user's guide to the encyclopedia of DNA elements (ENCODE).DNA 元件百科全书(ENCODE)使用指南
PLoS Biol. 2011 Apr;9(4):e1001046. doi: 10.1371/journal.pbio.1001046. Epub 2011 Apr 19.
9
Proteomic analysis of stage I endometrial cancer tissue: identification of proteins associated with oxidative processes and inflammation.蛋白质组学分析Ⅰ期子宫内膜癌组织:鉴定与氧化过程和炎症相关的蛋白质。
Gynecol Oncol. 2011 Jun 1;121(3):586-94. doi: 10.1016/j.ygyno.2011.02.031. Epub 2011 Apr 1.
10
Explaining odds ratios.解释比值比
J Can Acad Child Adolesc Psychiatry. 2010 Aug;19(3):227-9.