Suppr超能文献

微阵列和EST数据库对mRNA表达水平的估计存在差异:秀丽隐杆线虫的蛋白质长度与表达曲线

Microarray and EST database estimates of mRNA expression levels differ: the protein length versus expression curve for C. elegans.

作者信息

Munoz Enrique T, Bogarad Leonard D, Deem Michael W

机构信息

Department of Bioengineering, Rice University, Houston, TX 77005-1892 USA.

出版信息

BMC Genomics. 2004 May 10;5(1):30. doi: 10.1186/1471-2164-5-30.

Abstract

BACKGROUND

Various methods for estimating protein expression levels are known. The level of correlation between these methods is only fair, and systematic biases in each of the methods cannot be ruled out. We here investigate systematic biases in the estimation of gene expression rates from microarray data and from abundance within the Expressed Sequence Tag (EST) database. We suggest that length is a significant factor in biases to measured gene expression rates. As a specific example of the importance of the bias of expression rate with length, we address the following evolutionary question: Does the average C. elegans protein length increase or decrease with expression level? Two different answers to this question have been reported in the literature, one method using expression levels estimated by abundance within the EST database and another using microarrays. We have investigated this issue by constructing the full protein length versus expression curve for C. elegans, using both methods for estimating expression levels.

RESULTS

The microarray data show a monotonic decrease of length with expression level, whereas the abundance within the EST database data show a non-monotonic behavior. Furthermore, the ratio of the expression level estimated by the EST database to that measured by microarrays is not constant, but rather systematically biased with gene length.

CONCLUSIONS

It is suggested that the length bias may lie primarily in the abundance within the EST database method, being not ameliorated by internal standards as it is in the microarray data, and that this bias should be removed before data interpretation. When this is done, both the microarray and the abundance within the EST database give a monotonic decrease of spliced length with expression level, and the correlation between the EST and microarray data becomes larger. We suggest that standard RNA controls be used to normalize for length bias in any method that measures expression.

摘要

背景

已知多种估算蛋白质表达水平的方法。这些方法之间的相关性一般,且每种方法都不能排除存在系统偏差。我们在此研究从微阵列数据和表达序列标签(EST)数据库中的丰度估算基因表达率时的系统偏差。我们认为长度是影响测得的基因表达率偏差的一个重要因素。作为表达率偏差随长度变化重要性的一个具体例子,我们探讨以下进化问题:秀丽隐杆线虫的平均蛋白质长度随表达水平增加还是减少?文献中对这个问题有两种不同的答案,一种方法使用EST数据库中的丰度估算表达水平,另一种使用微阵列。我们通过构建秀丽隐杆线虫的完整蛋白质长度与表达曲线来研究这个问题,使用了两种估算表达水平的方法。

结果

微阵列数据显示长度随表达水平单调下降,而EST数据库中的丰度数据显示出非单调行为。此外,表示EST数据库估算的表达水平与微阵列测得的表达水平的比率并非恒定,而是随基因长度存在系统偏差。

结论

表明长度偏差可能主要存在于EST数据库方法的丰度估算中,不像微阵列数据那样能通过内标得到改善,并且在数据解读前应消除这种偏差。这样做之后,微阵列和EST数据库中的丰度数据都显示剪接长度随表达水平单调下降,并且EST与微阵列数据之间的相关性变得更大。我们建议在任何测量表达的方法中使用标准RNA对照来校正长度偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c1/434498/c3804849ad9e/1471-2164-5-30-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验