微阵列分析中的质量加权均值和T检验可提高基因表达测量的准确性，并减少差异表达检测中的I型和II型错误。

Quality Weighted Mean and T-test in Microarray Analysis Lead to Improved Accuracy in Gene Expression Measurements and Reduced Type I and II Errors in Differential Expression Detection.

作者信息

Gao Shouguo, Jia Shuang, Hessner Martin, Wang Xujing

机构信息

Department of Physics & the Comprehensive Diabetes Center, University of Alabama at Birmingham, 1300 University Blvd, Birmingham, AL 35294, USA.

出版信息

J Comput Sci Syst Biol. 2008 Dec 26;1:41. doi: 10.4172/jcsb.1000003.

DOI:10.4172/jcsb.1000003

PMID:20151041

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2819534/

Abstract

Previously we have reported a microarray image processing and data analysis package Matarray, where quality scores are defined for every spot that reflect the reliability and variability of the data acquired from each spot. In this article we present a new development in Matarray, where the quality scores are incorporated as weights in the statistical evaluation and data mining of microarray data. With this approach filtering of poor quality data is automatically achieved through the reduction in their weights, thereby eliminating the need to manually flag or remove bad data points, as well as the problem of missing values. More significantly, utilizing a set of control clones spiked in at known input ratios ranging from 1:30 to 30:1, we find that the quality-weighted statistics leads to more accurate gene expression measurements and more sensitive detection of their changes with significantly lower type II error rates. Further, we have applied the quality-weighted clustering to a time-course microarray data set, and find that the new algorithm improves grouping accuracy. In summary, incorporating quantitative quality measure of microarray data as weight in complex data analysis leads to improved reliability and convenience. In addition it provides a practical way to deal with the missing value issue in establishing automatic statistical tests.

摘要

此前我们报道过一个微阵列图像处理和数据分析软件包Matarray，其中为每个点定义了质量分数，这些分数反映了从每个点获取的数据的可靠性和可变性。在本文中，我们展示了Matarray的一项新进展，即质量分数被用作微阵列数据统计评估和数据挖掘中的权重。通过这种方法，低质量数据会因其权重降低而自动被过滤，从而无需手动标记或删除不良数据点，也避免了缺失值问题。更重要的是，利用一组以1:30至30:1的已知输入比例掺入的对照克隆，我们发现质量加权统计能带来更准确的基因表达测量结果，并能更灵敏地检测其变化，同时显著降低II型错误率。此外，我们将质量加权聚类应用于一个时间进程微阵列数据集，发现新算法提高了分组准确性。总之，在复杂数据分析中，将微阵列数据的定量质量度量作为权重纳入，可提高可靠性和便利性。此外，它还为在建立自动统计测试时处理缺失值问题提供了一种实用方法。