• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非常重要的基因池(VIP)基因——基于微阵列的分子特征的一种应用。

Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.

作者信息

Su Zhenqiang, Hong Huixiao, Fang Hong, Shi Leming, Perkins Roger, Tong Weida

机构信息

Center for Toxicoinformatics, National Center for Toxicological Research (NCTR), U,S, Food and Drug Administration (FDA), 3900 NCTR Road, Jefferson, AR 72079, USA.

出版信息

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9.

DOI:10.1186/1471-2105-9-S9-S9
PMID:18793473
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2537560/
Abstract

BACKGROUND

Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics.

RESULTS

A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples.

CONCLUSION

The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.

摘要

背景

DNA微阵列技术的进展预示着微阵列的分子特征最终将用于临床环境和个性化医疗。生物标志物的推导是超越假设生成的一大步,并且在识别用于区分表型的信息丰富的基因子集时,对准确性有更高的要求。与大量基因相比,微阵列数据样本和重复样本较少的固有性质,要求在构建分类器之前识别信息丰富的基因。然而,提高识别差异基因的能力仍然是生物信息学中的一个挑战。

结果

研究了一种新的混合基因选择方法,并使用九个公开可用的微阵列数据集进行了测试。该新方法从基因表达数据的广泛模式中识别出一个非常重要的基因池(VIP)。该方法采用装袋抽样原理,其中重新抽样的阵列用于识别最具信息性的基因。选择频率在重复过程中用于识别VIP基因。使用两种方法选择推定的信息丰富基因,即t统计量和判别分析。在t统计量中,基于p值识别信息丰富基因。在判别分析中,对每类样本进行不相交的主成分分析(PCA),并识别具有高判别力(DP)的基因。将VIP基因选择方法与p值排序方法进行了比较。通过VIP方法识别但未通过p值排序方法识别的基因也与所研究疾病相关。更重要的是,这些基因是VIP和p排序方法共有的常见基因所衍生的通路的一部分。此外,由这些基因构建的二元分类器在区分不同类型样本时与由前50个p值排序基因构建的分类器在统计学上等效。

结论

VIP基因选择方法可以识别出p值排序方法不一定能选择的其他信息丰富基因子集。这些基因可能是额外的真阳性,因为它们是由p值排序方法识别出的通路的一部分,并且预计与相关生物学有关。因此,从VIP方法衍生的这些额外基因可能提供有价值的生物学见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b38/2537560/977752030d0c/1471-2105-9-S9-S9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b38/2537560/518e04a3f714/1471-2105-9-S9-S9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b38/2537560/977752030d0c/1471-2105-9-S9-S9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b38/2537560/518e04a3f714/1471-2105-9-S9-S9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b38/2537560/977752030d0c/1471-2105-9-S9-S9-2.jpg

相似文献

1
Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.非常重要的基因池(VIP)基因——基于微阵列的分子特征的一种应用。
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9.
2
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data.HykGene:一种利用微阵列基因表达数据选择用于表型分类的标记基因的混合方法。
Bioinformatics. 2005 Apr 15;21(8):1530-7. doi: 10.1093/bioinformatics/bti192. Epub 2004 Dec 7.
3
A balanced iterative random forest for gene selection from microarray data.一种基于平衡迭代随机森林的微阵列数据基因选择方法。
BMC Bioinformatics. 2013 Aug 27;14:261. doi: 10.1186/1471-2105-14-261.
4
TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection.TSG:一种用于二分类和多分类癌症分类及信息基因选择的新算法。
BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1755-8794-6-S1-S3. Epub 2013 Jan 23.
5
Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.
6
Reliable gene signatures for microarray classification: assessment of stability and performance.用于微阵列分类的可靠基因特征:稳定性和性能评估
Bioinformatics. 2006 Oct 1;22(19):2356-63. doi: 10.1093/bioinformatics/btl400. Epub 2006 Jul 31.
7
The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies.微阵列研究中差异表达基因列表的可重复性、敏感性和特异性之间的平衡。
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S10. doi: 10.1186/1471-2105-9-S9-S10.
8
Reproducibility-optimized test statistic for ranking genes in microarray studies.微阵列研究中用于基因排名的可重复性优化检验统计量。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):423-431. doi: 10.1109/tcbb.2007.1078.
9
Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.用于癌症微阵列数据分类的分层基因选择与遗传模糊系统
PLoS One. 2015 Mar 30;10(3):e0120364. doi: 10.1371/journal.pone.0120364. eCollection 2015.
10
Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data.将多重假设检验和亲和传播聚类相结合,可以实现基因表达数据的准确、稳健和样本量独立分类。
BMC Bioinformatics. 2012 Oct 17;13:270. doi: 10.1186/1471-2105-13-270.

引用本文的文献

1
Nanomaterial Databases: Data Sources for Promoting Design and Risk Assessment of Nanomaterials.纳米材料数据库:促进纳米材料设计与风险评估的数据来源。
Nanomaterials (Basel). 2021 Jun 18;11(6):1599. doi: 10.3390/nano11061599.
2
An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era.对源自传统微阵列数据的生物标志物在RNA测序时代的效用进行的一项调查。
Genome Biol. 2014 Dec 3;15(12):523. doi: 10.1186/s13059-014-0523-y.
3
Proceedings of the 2009 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.

本文引用的文献

1
New resampling method for evaluating stability of clusters.用于评估聚类稳定性的新重采样方法。
BMC Bioinformatics. 2008 Jan 24;9:42. doi: 10.1186/1471-2105-9-42.
2
Using repeated measurements to validate hierarchical gene clusters.使用重复测量来验证分层基因簇。
Bioinformatics. 2008 Mar 1;24(5):682-8. doi: 10.1093/bioinformatics/btn017. Epub 2008 Jan 19.
3
Hybrid huberized support vector machines for microarray classification and gene selection.用于微阵列分类和基因选择的混合胡贝尔化支持向量机
2009年中南计算生物学与生物信息学学会(MCBIOS)会议论文集。引言。
BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2105-10-S11-S1.
4
Proceedings of the 2008 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference.2008年中南计算生物学与生物信息学学会(MCBIOS)会议论文集
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-9-S9-S1.
Bioinformatics. 2008 Feb 1;24(3):412-9. doi: 10.1093/bioinformatics/btm579. Epub 2008 Jan 5.
4
Reproducible and reliable microarray results through quality control: good laboratory proficiency and appropriate data analysis practices are essential.通过质量控制获得可重复且可靠的微阵列结果:良好的实验室操作水平和恰当的数据分析方法至关重要。
Curr Opin Biotechnol. 2008 Feb;19(1):10-8. doi: 10.1016/j.copbio.2007.11.003. Epub 2007 Dec 26.
5
Gene selection for classification of microarray data based on the Bayes error.基于贝叶斯误差的微阵列数据分类基因选择
BMC Bioinformatics. 2007 Oct 3;8(1):370. doi: 10.1186/1471-2105-8-370.
6
A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification.用于估计微阵列分类中预测误差的自助法与调整后的自助法的比较。
Stat Med. 2007 Dec 20;26(29):5320-34. doi: 10.1002/sim.2968.
7
Gene selection with multiple ordering criteria.具有多种排序标准的基因选择
BMC Bioinformatics. 2007 Mar 5;8:74. doi: 10.1186/1471-2105-8-74.
8
Consensus analysis of multiple classifiers using non-repetitive variables: diagnostic application to microarray gene expression data.使用非重复变量的多个分类器的一致性分析:在微阵列基因表达数据中的诊断应用
Comput Biol Chem. 2007 Feb;31(1):48-56. doi: 10.1016/j.compbiolchem.2007.01.001. Epub 2007 Jan 4.
9
Computational approaches to analysis of DNA microarray data.DNA微阵列数据分析的计算方法。
Yearb Med Inform. 2006:91-103.
10
The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.微阵列质量控制(MAQC)项目展示了基因表达测量在不同平台间和同一平台内的可重复性。
Nat Biotechnol. 2006 Sep;24(9):1151-61. doi: 10.1038/nbt1239.