利用加权偏最小二乘法从相关微阵列研究中借用信息进行样本分类。

Borrowing information from relevant microarray studies for sample classification using weighted partial least squares.

作者信息

Huang Xiaohong, Pan Wei, Han Xinqiang, Chen Yingjie, Miller Leslie W, Hall Jennifer

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building (MMC 303), Minneapolis, MN 55455-0378, USA.

出版信息

Comput Biol Chem. 2005 Jun;29(3):204-11. doi: 10.1016/j.compbiolchem.2005.04.002.

DOI:10.1016/j.compbiolchem.2005.04.002

PMID:15979040

Abstract

With an increasing number of publicly available microarray datasets, it becomes attractive to borrow information from other relevant studies to have more reliable and powerful analysis of a given dataset. We do not assume that subjects in the current study and other relevant studies are drawn from the same population as assumed by meta-analysis. In particular, the set of parameters in the current study may be different from that of the other studies. We consider sample classification based on gene expression profiles in this context. We propose two new methods, a weighted partial least squares (WPLS) method and a weighted penalized partial least squares (WPPLS) method, to build a classifier by a combined use of multiple datasets. The methods can weight the individual datasets depending on their relevance to the current study. A more standard approach is first to build a classifier using each of the individual datasets, then to combine the outputs of the multiple classifiers using a weighted voting. Using two quite different datasets on human heart failure, we show first that WPLS/WPPLS, by borrowing information from the other dataset, can improve the performance of PLS/PPLS built on only a single dataset. Second, WPLS/WPPLS performs better than the standard approach of combining multiple classifiers. Third, WPPLS can improve over WPLS, just as PPLS does over PLS for a single dataset.

摘要

随着公开可用的微阵列数据集越来越多，从其他相关研究中借鉴信息，以便对给定数据集进行更可靠、更有力的分析，变得颇具吸引力。我们并不像荟萃分析那样假设当前研究和其他相关研究中的受试者来自同一总体。特别是，当前研究中的参数集可能与其他研究的不同。在此背景下，我们考虑基于基因表达谱的样本分类。我们提出了两种新方法，即加权偏最小二乘法（WPLS）和加权惩罚偏最小二乘法（WPPLS），通过结合使用多个数据集来构建分类器。这些方法可以根据各个数据集与当前研究的相关性对其进行加权。一种更为标准的方法是，首先使用每个单独的数据集构建一个分类器，然后使用加权投票来组合多个分类器的输出。通过使用两个截然不同的人类心力衰竭数据集，我们首先表明，WPLS/WPPLS通过借鉴另一个数据集的信息，可以提高仅基于单个数据集构建的PLS/PPLS的性能。其次，WPLS/WPPLS的性能优于组合多个分类器的标准方法。第三，WPPLS相对于WPLS有所改进，就如同单个数据集的PPLS相对于PLS有所改进一样。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用加权偏最小二乘法从相关微阵列研究中借用信息进行样本分类。

Borrowing information from relevant microarray studies for sample classification using weighted partial least squares.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

利用加权偏最小二乘法从相关微阵列研究中借用信息进行样本分类。

Borrowing information from relevant microarray studies for sample classification using weighted partial least squares.

作者信息

机构信息

出版信息

相似文献

引用本文的文献