• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用来自大型、生物多样性数据库的参考训练集对Affymetrix基因芯片数据进行汇总的方法。

A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database.

作者信息

Katz Simon, Irizarry Rafael A, Lin Xue, Tripputi Mark, Porter Mark W

机构信息

Gene Logic Inc., 610 Professional Dr, Gaithersburg, MD, 20876, USA.

出版信息

BMC Bioinformatics. 2006 Oct 23;7:464. doi: 10.1186/1471-2105-7-464.

DOI:10.1186/1471-2105-7-464
PMID:17059591
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1624855/
Abstract

BACKGROUND

Many of the most popular pre-processing methods for Affymetrix expression arrays, such as RMA, gcRMA, and PLIER, simultaneously analyze data across a set of predetermined arrays to improve precision of the final measures of expression. One problem associated with these algorithms is that expression measurements for a particular sample are highly dependent on the set of samples used for normalization and results obtained by normalization with a different set may not be comparable. A related problem is that an organization producing and/or storing large amounts of data in a sequential fashion will need to either re-run the pre-processing algorithm every time an array is added or store them in batches that are pre-processed together. Furthermore, pre-processing of large numbers of arrays requires loading all the feature-level data into memory which is a difficult task even with modern computers. We utilize a scheme that produces all the information necessary for pre-processing using a very large training set that can be used for summarization of samples outside of the training set. All subsequent pre-processing tasks can be done on an individual array basis. We demonstrate the utility of this approach by defining a new version of the Robust Multi-chip Averaging (RMA) algorithm which we refer to as refRMA.

RESULTS

We assess performance based on multiple sets of samples processed over HG U133A Affymetrix GeneChip arrays. We show that the refRMA workflow, when used in conjunction with a large, biologically diverse training set, results in the same general characteristics as that of RMA in its classic form when comparing overall data structure, sample-to-sample correlation, and variation. Further, we demonstrate that the refRMA workflow and reference set can be robustly applied to naïve organ types and to benchmark data where its performance indicates respectable results.

CONCLUSION

Our results indicate that a biologically diverse reference database can be used to train a model for estimating probe set intensities of exclusive test sets, while retaining the overall characteristics of the base algorithm. Although the results we present are specific for RMA, similar versions of other multi-array normalization and summarization schemes can be developed.

摘要

背景

许多用于Affymetrix表达阵列的最流行的预处理方法,如RMA、gcRMA和PLIER,会同时分析一组预定阵列中的数据,以提高最终表达量度的精度。与这些算法相关的一个问题是,特定样本的表达测量高度依赖于用于归一化的样本集,并且用不同的样本集进行归一化得到的结果可能不可比。一个相关问题是,以顺序方式生成和/或存储大量数据的机构,要么每次添加一个阵列时都重新运行预处理算法,要么将它们按批存储并一起进行预处理。此外,对大量阵列进行预处理需要将所有特征级数据加载到内存中,即使使用现代计算机,这也是一项艰巨的任务。我们采用一种方案,该方案使用一个非常大的训练集生成预处理所需的所有信息,该训练集可用于汇总训练集之外的样本。所有后续的预处理任务都可以在单个阵列的基础上完成。我们通过定义一种新的稳健多芯片平均(RMA)算法(我们称之为refRMA)来证明这种方法的实用性。

结果

我们基于在HG U133A Affymetrix基因芯片阵列上处理的多组样本评估性能。我们表明,当refRMA工作流程与一个大型的、生物多样性丰富的训练集结合使用时,在比较整体数据结构、样本间相关性和变异性时,其产生的总体特征与经典形式的RMA相同。此外,我们证明refRMA工作流程和参考集可以稳健地应用于未经处理的器官类型和基准数据,其性能显示出可观的结果。

结论

我们的结果表明,一个生物多样性丰富的参考数据库可用于训练一个模型,以估计排他性测试集的探针集强度,同时保留基础算法的总体特征。尽管我们给出的结果是针对RMA的,但也可以开发其他多阵列归一化和汇总方案的类似版本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/e2fc46f035ee/1471-2105-7-464-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/ad10272456bd/1471-2105-7-464-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/b041e0b952fe/1471-2105-7-464-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/29d5575e22ab/1471-2105-7-464-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/104e25e8000c/1471-2105-7-464-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/56c0be364926/1471-2105-7-464-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/e2fc46f035ee/1471-2105-7-464-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/ad10272456bd/1471-2105-7-464-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/b041e0b952fe/1471-2105-7-464-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/29d5575e22ab/1471-2105-7-464-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/104e25e8000c/1471-2105-7-464-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/56c0be364926/1471-2105-7-464-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31d2/1624855/e2fc46f035ee/1471-2105-7-464-6.jpg

相似文献

1
A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database.一种使用来自大型、生物多样性数据库的参考训练集对Affymetrix基因芯片数据进行汇总的方法。
BMC Bioinformatics. 2006 Oct 23;7:464. doi: 10.1186/1471-2105-7-464.
2
Tissue-specific RMA models to incrementally normalize Affymetrix GeneChip data.用于逐步标准化Affymetrix基因芯片数据的组织特异性RMA模型。
Annu Int Conf IEEE Eng Med Biol Soc. 2008;2008:2419-22. doi: 10.1109/IEMBS.2008.4649687.
3
A probe-treatment-reference (PTR) model for the analysis of oligonucleotide expression microarrays.一种用于分析寡核苷酸表达微阵列的探针-处理-参考(PTR)模型。
BMC Bioinformatics. 2008 Apr 14;9:194. doi: 10.1186/1471-2105-9-194.
4
A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat.一项针对Affymetrix基因组阵列探针序列的验证方案显示,该探针在小鼠、人类和大鼠研究中具有很高的准确性。
BMC Bioinformatics. 2007 Apr 20;8:132. doi: 10.1186/1471-2105-8-132.
5
AffyProbeMiner: a web resource for computing or retrieving accurately redefined Affymetrix probe sets.AffyProbeMiner:一个用于计算或检索精确重新定义的Affymetrix探针集的网络资源。
Bioinformatics. 2007 Sep 15;23(18):2385-90. doi: 10.1093/bioinformatics/btm360. Epub 2007 Jul 27.
6
Empirical Bayes models for multiple probe type microarrays at the probe level.探针水平上多探针类型微阵列的经验贝叶斯模型。
BMC Bioinformatics. 2008 Mar 20;9:156. doi: 10.1186/1471-2105-9-156.
7
A new summarization method for Affymetrix probe level data.一种针对Affymetrix探针水平数据的新汇总方法。
Bioinformatics. 2006 Apr 15;22(8):943-9. doi: 10.1093/bioinformatics/btl033. Epub 2006 Feb 10.
8
Assessing the need for sequence-based normalization in tiling microarray experiments.评估平铺式微阵列实验中基于序列标准化的必要性。
Bioinformatics. 2007 Apr 15;23(8):988-97. doi: 10.1093/bioinformatics/btm052. Epub 2007 Mar 25.
9
A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge.一种用于在转录和芯片免疫沉淀实验中有效分割平铺阵列数据的监督隐马尔可夫模型框架:系统地整合经过验证的生物学知识。
Bioinformatics. 2006 Dec 15;22(24):3016-24. doi: 10.1093/bioinformatics/btl515. Epub 2006 Oct 12.
10
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。
BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

引用本文的文献

1
Development of a prediction model for ctDNA detection (Cir-Predict) in breast cancer.乳腺癌中循环肿瘤DNA检测预测模型(Cir-Predict)的开发。
Breast Cancer Res Treat. 2025 Jun;211(2):331-339. doi: 10.1007/s10549-025-07647-0. Epub 2025 Mar 7.
2
Deciphering the potential ability of DExD/H-box helicase 60 (DDX60) on the proliferation, diagnostic and prognostic biomarker in pancreatic cancer: a research based on silico, RNA-seq and molecular biology experiment.解析DExD/H盒解旋酶60(DDX60)在胰腺癌增殖、诊断及预后生物标志物方面的潜在能力:一项基于计算机分析、RNA测序和分子生物学实验的研究
Hereditas. 2025 Jan 22;162(1):6. doi: 10.1186/s41065-024-00361-9.
3

本文引用的文献

1
Partition resampling and extrapolation averaging: approximation methods for quantifying gene expression in large numbers of short oligonucleotide arrays.分区重采样与外推平均法:用于量化大量短寡核苷酸阵列中基因表达的近似方法。
Bioinformatics. 2006 Oct 1;22(19):2364-72. doi: 10.1093/bioinformatics/btl402. Epub 2006 Jul 28.
2
Human lung project: evaluating variance of gene expression in the human lung.人类肺部项目:评估人类肺部基因表达的差异
Am J Respir Cell Mol Biol. 2006 Jul;35(1):65-71. doi: 10.1165/rcmb.2004-0261OC. Epub 2006 Feb 23.
3
Multiple-laboratory comparison of microarray platforms.
Development of a prognostic gene signature and exploration of P4HA1 in the modulation of cuproptosis in colorectal cancer.
结直肠癌预后基因特征的开发及P4HA1在铜死亡调节中的探索
Sci Rep. 2024 Dec 30;14(1):31766. doi: 10.1038/s41598-024-82625-y.
4
Multi-omics characterization and machine learning of lung adenocarcinoma molecular subtypes to guide precise chemotherapy and immunotherapy.肺腺癌分子亚型的多组学特征分析及机器学习以指导精准化疗和免疫治疗
Front Immunol. 2024 Nov 28;15:1497300. doi: 10.3389/fimmu.2024.1497300. eCollection 2024.
5
Identification of key biomarkers for predicting CAD progression in inflammatory bowel disease via machine-learning and bioinformatics strategies.通过机器学习和生物信息学策略识别预测炎症性肠病中 CAD 进展的关键生物标志物。
J Cell Mol Med. 2024 Mar;28(6):e18175. doi: 10.1111/jcmm.18175.
6
A simplified machine learning model utilizing platelet-related genes for predicting poor prognosis in sepsis.利用血小板相关基因的简化机器学习模型预测脓毒症不良预后。
Front Immunol. 2023 Nov 20;14:1286203. doi: 10.3389/fimmu.2023.1286203. eCollection 2023.
7
Host M-CSF induced gene expression drives changes in susceptible and resistant mice-derived BMdMs upon infection.宿主 M-CSF 诱导的基因表达驱动了感染后易感和抗性小鼠来源的 BMdMs 的变化。
Front Immunol. 2023 Apr 28;14:1111072. doi: 10.3389/fimmu.2023.1111072. eCollection 2023.
8
The Characteristics of Tumor Microenvironment Predict Survival and Response to Immunotherapy in Adrenocortical Carcinomas.肿瘤微环境特征可预测肾上腺皮质癌的生存和免疫治疗反应。
Cells. 2023 Feb 27;12(5):755. doi: 10.3390/cells12050755.
9
The impact of alcoholic drinks and dietary factors on epigenetic markers associated with triglyceride levels.酒精饮料和饮食因素对与甘油三酯水平相关的表观遗传标志物的影响。
Front Genet. 2023 Feb 15;14:1117778. doi: 10.3389/fgene.2023.1117778. eCollection 2023.
10
Identification of AGXT2, SHMT1, and ACO2 as important biomarkers of acute kidney injury by WGCNA.通过 WGCNA 鉴定 AGXT2、SHMT1 和 ACO2 为急性肾损伤的重要生物标志物。
PLoS One. 2023 Feb 3;18(2):e0281439. doi: 10.1371/journal.pone.0281439. eCollection 2023.
微阵列平台的多实验室比较
Nat Methods. 2005 May;2(5):345-50. doi: 10.1038/nmeth756. Epub 2005 Apr 21.
4
Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.基于疾病谱数据中错误发现率的七种生成Affymetrix表达分数方法的比较。
BMC Bioinformatics. 2005 Feb 10;6:26. doi: 10.1186/1471-2105-6-26.
5
Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset.由完全定义的对照数据集揭示的Affymetrix基因芯片的首选分析方法。
Genome Biol. 2005;6(2):R16. doi: 10.1186/gb-2005-6-2-r16. Epub 2005 Jan 28.
6
NCBI GEO: mining millions of expression profiles--database and tools.NCBI基因表达综合数据库:挖掘数百万个表达谱——数据库与工具
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D562-6. doi: 10.1093/nar/gki022.
7
Bioconductor: open software development for computational biology and bioinformatics.生物导体:用于计算生物学和生物信息学的开源软件开发。
Genome Biol. 2004;5(10):R80. doi: 10.1186/gb-2004-5-10-r80. Epub 2004 Sep 15.
8
A benchmark for Affymetrix GeneChip expression measures.Affymetrix基因芯片表达量测量的一个基准。
Bioinformatics. 2004 Feb 12;20(3):323-31. doi: 10.1093/bioinformatics/btg410.
9
affy--analysis of Affymetrix GeneChip data at the probe level.affy——在探针水平对Affymetrix基因芯片数据进行分析。
Bioinformatics. 2004 Feb 12;20(3):307-15. doi: 10.1093/bioinformatics/btg405.
10
Exploration, normalization, and summaries of high density oligonucleotide array probe level data.高密度寡核苷酸阵列探针水平数据的探索、标准化及汇总
Biostatistics. 2003 Apr;4(2):249-64. doi: 10.1093/biostatistics/4.2.249.