DeConde Robert P, Hawley Sarah, Falcon Seth, Clegg Nigel, Knudsen Beatrice, Etzioni Ruth
Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA.
Stat Appl Genet Mol Biol. 2006;5:Article15. doi: 10.2202/1544-6115.1204. Epub 2006 Jun 20.
As technology for microarray analysis becomes widespread, it is becoming increasingly important to be able to compare and combine the results of experiments that explore the same scientific question. In this article, we present a rank-aggregation approach for combining results from several microarray studies. The motivation for this approach is twofold; first, the final results of microarray studies are typically expressed as lists of genes, rank-ordered by a measure of the strength of evidence that they are functionally involved in the disease process, and second, using the information on this rank-ordered metric means that we do not have to concern ourselves with data on the actual expression levels, which may not be comparable across experiments. Our approach draws on methods for combining top-k lists from the computer science literature on meta-search. The meta-search problem shares several important features with that of combining microarray experiments, including the fact that there are typically few lists with many elements and the elements may not be common to all lists. We implement two meta-search algorithms, which use a Markov chain framework to convert pairwise preferences between list elements into a stationary distribution that represents an aggregate ranking (Dwork et al, 2001). We explore the behavior of the algorithms in hypothetical examples and a simulated dataset and compare their performance with that of an algorithm based on the order-statistics model of Thurstone (Thurstone, 1927). We apply all three algorithms to aggregate the results of five microarray studies of prostate cancer.
随着微阵列分析技术的广泛应用,能够比较和整合针对同一科学问题的实验结果变得越来越重要。在本文中,我们提出了一种用于整合多个微阵列研究结果的秩聚合方法。采用这种方法有两个动机:其一,微阵列研究的最终结果通常表示为基因列表,并根据它们在疾病过程中功能参与的证据强度进行排序;其二,利用这种排序度量的信息意味着我们无需关注实际表达水平的数据,因为这些数据在不同实验之间可能不可比。我们的方法借鉴了计算机科学文献中关于元搜索的合并前k列表的方法。元搜索问题与整合微阵列实验的问题有几个重要的共同特征,包括通常列表数量少但元素多,并且并非所有列表都有共同元素这一事实。我们实现了两种元搜索算法,它们使用马尔可夫链框架将列表元素之间的成对偏好转换为表示聚合排名的平稳分布(德沃克等人,2001年)。我们在假设示例和模拟数据集中探究了这些算法的行为,并将它们的性能与基于瑟斯顿顺序统计模型的算法(瑟斯顿,1927年)进行了比较。我们应用这三种算法来整合五项前列腺癌微阵列研究的结果。