Suppr超能文献

非常重要的基因池(VIP)基因——基于微阵列的分子特征的一种应用。

Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.

作者信息

Su Zhenqiang, Hong Huixiao, Fang Hong, Shi Leming, Perkins Roger, Tong Weida

机构信息

Center for Toxicoinformatics, National Center for Toxicological Research (NCTR), U,S, Food and Drug Administration (FDA), 3900 NCTR Road, Jefferson, AR 72079, USA.

出版信息

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9.

Abstract

BACKGROUND

Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics.

RESULTS

A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples.

CONCLUSION

The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.

摘要

背景

DNA微阵列技术的进展预示着微阵列的分子特征最终将用于临床环境和个性化医疗。生物标志物的推导是超越假设生成的一大步,并且在识别用于区分表型的信息丰富的基因子集时,对准确性有更高的要求。与大量基因相比,微阵列数据样本和重复样本较少的固有性质,要求在构建分类器之前识别信息丰富的基因。然而,提高识别差异基因的能力仍然是生物信息学中的一个挑战。

结果

研究了一种新的混合基因选择方法,并使用九个公开可用的微阵列数据集进行了测试。该新方法从基因表达数据的广泛模式中识别出一个非常重要的基因池(VIP)。该方法采用装袋抽样原理,其中重新抽样的阵列用于识别最具信息性的基因。选择频率在重复过程中用于识别VIP基因。使用两种方法选择推定的信息丰富基因,即t统计量和判别分析。在t统计量中,基于p值识别信息丰富基因。在判别分析中,对每类样本进行不相交的主成分分析(PCA),并识别具有高判别力(DP)的基因。将VIP基因选择方法与p值排序方法进行了比较。通过VIP方法识别但未通过p值排序方法识别的基因也与所研究疾病相关。更重要的是,这些基因是VIP和p排序方法共有的常见基因所衍生的通路的一部分。此外,由这些基因构建的二元分类器在区分不同类型样本时与由前50个p值排序基因构建的分类器在统计学上等效。

结论

VIP基因选择方法可以识别出p值排序方法不一定能选择的其他信息丰富基因子集。这些基因可能是额外的真阳性,因为它们是由p值排序方法识别出的通路的一部分,并且预计与相关生物学有关。因此,从VIP方法衍生的这些额外基因可能提供有价值的生物学见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b38/2537560/518e04a3f714/1471-2105-9-S9-S9-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验