• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

面向机器学习研究人员的基因表达与微阵列入门知识。

A primer on gene expression and microarrays for machine learning researchers.

作者信息

Kuo Winston Patrick, Kim Eun-Young, Trimarchi Jeff, Jenssen Tor-Kristian, Vinterbo Staal A, Ohno-Machado Lucila

机构信息

Decision Systems Group, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

出版信息

J Biomed Inform. 2004 Aug;37(4):293-303. doi: 10.1016/j.jbi.2004.07.002.

DOI:10.1016/j.jbi.2004.07.002
PMID:15465482
Abstract

Data originating from biomedical experiments has provided machine learning researchers with an important source of motivation for developing and evaluating new algorithms. A new wave of algorithmic development has been initiated with the publication of gene expression data derived from microarrays. Microarray data analysis is particularly challenging given the large number of measurements (typically in the order of thousands) that are reported for relatively few samples (typically in the order of dozens). Many data sets are now available on the web. It is important that machine learning researchers understand how data are obtained and which assumptions are necessary in the analysis. Microarray data have the potential to cause significant impact in machine learning research, not just as a rich and realistic source of cases for testing new algorithms, as has been the UCI machine learning repository in the past decades, but also as a main motivation for their development. In this article, we briefly review the biology underlying microarrays, the process of obtaining gene expression measurements, and the rationale behind the common types of analyses involved in a microarray experiment. We outline the main challenges and reiterate critical considerations regarding the construction of supervised learning models that use this type of data. The goal of this article is to familiarize machine learning researchers with data originated from gene expression microarrays.

摘要

源自生物医学实验的数据为机器学习研究人员提供了开发和评估新算法的重要动力来源。随着源自微阵列的基因表达数据的公布,引发了新一轮的算法开发热潮。鉴于相对较少的样本(通常为几十份)却要报告大量的测量数据(通常为数千份),微阵列数据分析极具挑战性。现在网上有许多数据集可供使用。机器学习研究人员了解数据是如何获取的以及分析中需要哪些假设非常重要。微阵列数据有可能在机器学习研究中产生重大影响,这不仅是因为它像过去几十年里的UCI机器学习知识库一样,是测试新算法的丰富且现实的案例来源,还因为它是推动算法开发的主要动力。在本文中,我们简要回顾微阵列背后的生物学原理、获取基因表达测量值的过程以及微阵列实验中常见分析类型背后的基本原理。我们概述了主要挑战,并重申了关于构建使用此类数据的监督学习模型的关键注意事项。本文的目的是让机器学习研究人员熟悉源自基因表达微阵列的数据。

相似文献

1
A primer on gene expression and microarrays for machine learning researchers.面向机器学习研究人员的基因表达与微阵列入门知识。
J Biomed Inform. 2004 Aug;37(4):293-303. doi: 10.1016/j.jbi.2004.07.002.
2
Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.
3
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.
4
Identifying projected clusters from gene expression profiles.从基因表达谱中识别预测的聚类。
J Biomed Inform. 2004 Oct;37(5):345-57. doi: 10.1016/j.jbi.2004.05.002.
5
Considerations when using the significance analysis of microarrays (SAM) algorithm.使用微阵列显著性分析(SAM)算法时的注意事项。
BMC Bioinformatics. 2005 May 29;6:129. doi: 10.1186/1471-2105-6-129.
6
Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model.基于数据扰动对回归模型影响的微阵列标记错误检测方法。
Bioinformatics. 2009 Oct 15;25(20):2708-14. doi: 10.1093/bioinformatics/btp478. Epub 2009 Aug 6.
7
Ensemble machine learning on gene expression data for cancer classification.基于基因表达数据的集成机器学习用于癌症分类
Appl Bioinformatics. 2003;2(3 Suppl):S75-83.
8
Pathway analysis using random forests classification and regression.使用随机森林分类和回归的通路分析
Bioinformatics. 2006 Aug 15;22(16):2028-36. doi: 10.1093/bioinformatics/btl344. Epub 2006 Jun 29.
9
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
10
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类
Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

引用本文的文献

1
Utilization of Computer Classification Methods for Exposure Prediction and Gene Selection in Toxicogenomics.利用计算机分类方法进行毒理基因组学中的暴露预测和基因选择
Biology (Basel). 2023 May 9;12(5):692. doi: 10.3390/biology12050692.
2
A Python Clustering Analysis Protocol of Genes Expression Data Sets.基于基因表达数据集的 Python 聚类分析方案。
Genes (Basel). 2022 Oct 12;13(10):1839. doi: 10.3390/genes13101839.
3
Ten simple rules for organizing a special session at a scientific conference.组织科学会议特别会议的十个简单规则。
PLoS Comput Biol. 2022 Aug 25;18(8):e1010395. doi: 10.1371/journal.pcbi.1010395. eCollection 2022 Aug.
4
Effective feature selection framework for cluster analysis of microarray data.用于微阵列数据分析聚类的有效特征选择框架。
Bioinformation. 2010 Feb 28;4(8):385-9. doi: 10.6026/97320630004385.
5
A Marfan syndrome gene expression phenotype in cultured skin fibroblasts.培养的皮肤成纤维细胞中的马凡综合征基因表达表型。
BMC Genomics. 2007 Sep 12;8:319. doi: 10.1186/1471-2164-8-319.
6
Ethanol sensitivity: a central role for CREB transcription regulation in the cerebellum.乙醇敏感性:CREB转录调控在小脑中的核心作用。
BMC Genomics. 2006 Dec 5;7:308. doi: 10.1186/1471-2164-7-308.
7
Metabolic engineering in the -omics era: elucidating and modulating regulatory networks.组学时代的代谢工程:阐明和调控调控网络。
Microbiol Mol Biol Rev. 2005 Jun;69(2):197-216. doi: 10.1128/MMBR.69.2.197-216.2005.