Suppr超能文献

面向机器学习研究人员的基因表达与微阵列入门知识。

A primer on gene expression and microarrays for machine learning researchers.

作者信息

Kuo Winston Patrick, Kim Eun-Young, Trimarchi Jeff, Jenssen Tor-Kristian, Vinterbo Staal A, Ohno-Machado Lucila

机构信息

Decision Systems Group, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

出版信息

J Biomed Inform. 2004 Aug;37(4):293-303. doi: 10.1016/j.jbi.2004.07.002.

Abstract

Data originating from biomedical experiments has provided machine learning researchers with an important source of motivation for developing and evaluating new algorithms. A new wave of algorithmic development has been initiated with the publication of gene expression data derived from microarrays. Microarray data analysis is particularly challenging given the large number of measurements (typically in the order of thousands) that are reported for relatively few samples (typically in the order of dozens). Many data sets are now available on the web. It is important that machine learning researchers understand how data are obtained and which assumptions are necessary in the analysis. Microarray data have the potential to cause significant impact in machine learning research, not just as a rich and realistic source of cases for testing new algorithms, as has been the UCI machine learning repository in the past decades, but also as a main motivation for their development. In this article, we briefly review the biology underlying microarrays, the process of obtaining gene expression measurements, and the rationale behind the common types of analyses involved in a microarray experiment. We outline the main challenges and reiterate critical considerations regarding the construction of supervised learning models that use this type of data. The goal of this article is to familiarize machine learning researchers with data originated from gene expression microarrays.

摘要

源自生物医学实验的数据为机器学习研究人员提供了开发和评估新算法的重要动力来源。随着源自微阵列的基因表达数据的公布,引发了新一轮的算法开发热潮。鉴于相对较少的样本(通常为几十份)却要报告大量的测量数据(通常为数千份),微阵列数据分析极具挑战性。现在网上有许多数据集可供使用。机器学习研究人员了解数据是如何获取的以及分析中需要哪些假设非常重要。微阵列数据有可能在机器学习研究中产生重大影响,这不仅是因为它像过去几十年里的UCI机器学习知识库一样,是测试新算法的丰富且现实的案例来源,还因为它是推动算法开发的主要动力。在本文中,我们简要回顾微阵列背后的生物学原理、获取基因表达测量值的过程以及微阵列实验中常见分析类型背后的基本原理。我们概述了主要挑战,并重申了关于构建使用此类数据的监督学习模型的关键注意事项。本文的目的是让机器学习研究人员熟悉源自基因表达微阵列的数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验