识别在微阵列中对良好分类贡献最大的基因。

Identifying genes that contribute most to good classification in microarrays.

作者信息

Baker Stuart G, Kramer Barnett S

机构信息

Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD 20892-7354, USA.

出版信息

BMC Bioinformatics. 2006 Sep 7;7:407. doi: 10.1186/1471-2105-7-407.

DOI:10.1186/1471-2105-7-407

PMID:16959042

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1574352/

Abstract

BACKGROUND

The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples).

RESULTS

We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia.

CONCLUSION

Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules.

摘要

背景

大多数微阵列研究的目标要么是识别差异表达最显著的基因，要么是创建一个良好的分类规则。前者的缺点是忽略了基因相互作用的重要性；后者的缺点是它往往没有为进一步研究提供足够的重点，因为许多基因可能是偶然被纳入的。我们的策略是寻找使用少量基因就能表现良好的分类规则，如果找到了这样的规则，就识别在多次随机验证（随机划分为训练样本和测试样本）中相对频繁出现的基因。

结果

我们分析了四项已发表的与癌症相关研究的数据。对于分类，我们使用了一种带有最近质心规则的过滤器，该过滤器易于实现，并且先前已证明表现良好。为了全面衡量分类性能，我们使用了受试者工作特征曲线。在三个具有良好分类性能的数据集中，5个基因的分类规则仅比20个或50个基因的分类规则略差，并且比1个基因的分类规则略好。在其中两个数据集中，一两个基因具有相对较高的频率，这在涉及20个或50个基因的规则中并不明显：结蛋白用于区分结肠癌组织与正常组织；斑联蛋白和分泌颗粒蛋白聚糖基因用于区分两种白血病。

结论

使用多次随机验证时，研究人员应寻找使用少量基因就能表现良好的分类规则，并选择在这些分类规则中出现频率相对较高的基因进行进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86da/1574352/48409f033baf/1471-2105-7-407-1.jpg

相似文献

Identifying genes that contribute most to good classification in microarrays.识别在微阵列中对良好分类贡献最大的基因。

BMC Bioinformatics. 2006 Sep 7;7:407. doi: 10.1186/1471-2105-7-407.

Accurate molecular classification of cancer using simple rules.使用简单规则进行准确的癌症分子分类。

BMC Med Genomics. 2009 Oct 30;2:64. doi: 10.1186/1755-8794-2-64.

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。

Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

Simultaneous gene clustering and subset selection for sample classification via MDL.通过最小描述长度实现用于样本分类的同步基因聚类和子集选择

Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.

Reliable classification of two-class cancer data using evolutionary algorithms.使用进化算法对两类癌症数据进行可靠分类。

Biosystems. 2003 Nov;72(1-2):111-29. doi: 10.1016/s0303-2647(03)00138-2.

Effective dimension reduction methods for tumor classification using gene expression data.使用基因表达数据进行肿瘤分类的有效降维方法。

Bioinformatics. 2003 Mar 22;19(5):563-70. doi: 10.1093/bioinformatics/btg062.

Gene selection from microarray data for cancer classification--a machine learning approach.基于机器学习方法从微阵列数据中进行癌症分类的基因选择

Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.

Tumor classification by partial least squares using microarray gene expression data.利用微阵列基因表达数据通过偏最小二乘法进行肿瘤分类。

Bioinformatics. 2002 Jan;18(1):39-50. doi: 10.1093/bioinformatics/18.1.39.

On the statistical assessment of classifiers using DNA microarray data.关于使用DNA微阵列数据对分类器进行统计评估

BMC Bioinformatics. 2006 Aug 19;7:387. doi: 10.1186/1471-2105-7-387.

An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.一种用于微阵列数据中癌症类别发现和标记基因识别的无监督分层动态自组织方法。

Bioinformatics. 2003 Nov 1;19(16):2131-40. doi: 10.1093/bioinformatics/btg296.

引用本文的文献

Early-stage multi-cancer detection using an extracellular vesicle protein-based blood test.使用基于细胞外囊泡蛋白的血液检测进行早期多癌检测。

Commun Med (Lond). 2022 Mar 17;2:29. doi: 10.1038/s43856-022-00088-6. eCollection 2022.

LiKidMiRs: A ddPCR-Based Panel of 4 Circulating miRNAs for Detection of Renal Cell Carcinoma.LiKidMiRs：用于检测肾细胞癌的基于数字PCR的4种循环微RNA检测组

Cancers (Basel). 2022 Feb 9;14(4):858. doi: 10.3390/cancers14040858.

Early detection of the major male cancer types in blood-based liquid biopsies using a DNA methylation panel.基于 DNA 甲基化panel 的血液液体活检对主要男性癌症类型的早期检测。

Clin Epigenetics. 2019 Dec 2;11(1):175. doi: 10.1186/s13148-019-0779-x.

Subtyping Lung Cancer Using DNA Methylation in Liquid Biopsies.利用液体活检中的DNA甲基化对肺癌进行亚型分类。

J Clin Med. 2019 Sep 19;8(9):1500. doi: 10.3390/jcm8091500.

Cell-Free DNA Methylation of Selected Genes Allows for Early Detection of the Major Cancers in Women.特定基因的游离DNA甲基化有助于早期检测女性的主要癌症。

Cancers (Basel). 2018 Sep 26;10(10):357. doi: 10.3390/cancers10100357.

Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors.在高维基因组研究中使用非局部先验对二元结果进行贝叶斯变量选择。

Bioinformatics. 2016 May 1;32(9):1338-45. doi: 10.1093/bioinformatics/btv764. Epub 2016 Jan 6.

Combined rule extraction and feature elimination in supervised classification.监督分类中的规则提取和特征消除相结合。

IEEE Trans Nanobioscience. 2012 Sep;11(3):228-36. doi: 10.1109/TNB.2012.2213264.

Supervised Bayesian latent class models for high-dimensional data.监督贝叶斯潜在类别模型在高维数据中的应用。

Stat Med. 2012 Jun 15;31(13):1342-60. doi: 10.1002/sim.4448. Epub 2012 Apr 11.

A jackknife and voting classifier approach to feature selection and classification.一种用于特征选择和分类的折刀法及投票分类器方法。

Cancer Inform. 2011 Apr 27;10:133-47. doi: 10.4137/CIN.S7111.

Systems biology and cancer: promises and perils.系统生物学与癌症：前景与挑战。

Prog Biophys Mol Biol. 2011 Aug;106(2):410-3. doi: 10.1016/j.pbiomolbio.2011.03.002. Epub 2011 Mar 23.

本文引用的文献

Regularized binormal ROC method in disease classification using microarray data.使用微阵列数据进行疾病分类的正则化双法线ROC方法。

BMC Bioinformatics. 2006 May 9;7:253. doi: 10.1186/1471-2105-7-253.

Evaluating markers for the early detection of cancer: overview of study designs and methods.评估癌症早期检测标志物：研究设计与方法概述

Clin Trials. 2006;3(1):43-56. doi: 10.1191/1740774506cn130oa.

Gene selection algorithms for microarray data based on least squares support vector machine.基于最小二乘支持向量机的微阵列数据基因选择算法

BMC Bioinformatics. 2006 Feb 27;7:95. doi: 10.1186/1471-2105-7-95.

Fibroblastic polyp of the colon: clinicopathological analysis of 10 cases with emphasis on its common association with serrated crypts.结肠纤维母细胞性息肉：10例临床病理分析，重点关注其与锯齿状隐窝的常见关联

Histopathology. 2006 Mar;48(4):431-7. doi: 10.1111/j.1365-2559.2006.02357.x.

Classification of microarrays to nearest centroids.将微阵列分类到最近的质心。

Bioinformatics. 2005 Nov 15;21(22):4148-54. doi: 10.1093/bioinformatics/bti681. Epub 2005 Sep 20.

Prediction of cancer outcome with microarrays: a multiple random validation strategy.利用微阵列预测癌症预后：一种多重随机验证策略。

Lancet. 2005;365(9458):488-92. doi: 10.1016/S0140-6736(05)17866-0.

Gene selection from microarray data for cancer classification--a machine learning approach.基于机器学习方法从微阵列数据中进行癌症分类的基因选择

Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.

Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling.基因挖掘：一种利用微阵列表达谱寻找疾病基因的新颖且强大的集成决策方法。

Nucleic Acids Res. 2004 May 17;32(9):2685-94. doi: 10.1093/nar/gkh563. Print 2004.

Localization of serglycin in human neutrophil granulocytes and their precursors.丝甘蛋白聚糖在人中性粒细胞及其前体细胞中的定位。

J Leukoc Biol. 2004 Aug;76(2):406-15. doi: 10.1189/jlb.1003502. Epub 2004 May 10.

Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data.利用高维微阵列数据中的基因表达谱进行诊断和预后预测。

Br J Cancer. 2003 Nov 3;89(9):1599-604. doi: 10.1038/sj.bjc.6601326.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

识别在微阵列中对良好分类贡献最大的基因。

Identifying genes that contribute most to good classification in microarrays.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献