使用半监督学习在有注释和无注释的微阵列数据集中发现生物标志物。

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.

作者信息

Harris Cole, Ghaffari Noushin

机构信息

Exagen Diagnostics, Inc, Houston, TX, USA.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7.

DOI:10.1186/1471-2164-9-S2-S7

PMID:18831798

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2559897/

Abstract

The growing body of DNA microarray data has the potential to advance our understanding of the molecular basis of disease. However annotating microarray datasets with clinically useful information is not always possible, as this often requires access to detailed patient records. In this study we introduce GLAD, a new Semi-Supervised Learning (SSL) method for combining independent annotated datasets and unannotated datasets with the aim of identifying more robust sample classifiers. In our method, independent models are developed using subsets of genes for the annotated and unannotated datasets. These models are evaluated according to a scoring function that incorporates terms for classification accuracy on annotated data, and relative cluster separation in unannotated data. Improved models are iteratively generated using a genetic algorithm feature selection technique. Our results show that the addition of unannotated data into training, significantly improves classifier robustness.

摘要

越来越多的DNA微阵列数据有潜力促进我们对疾病分子基础的理解。然而，用临床有用信息注释微阵列数据集并非总是可行的，因为这通常需要获取详细的患者记录。在本研究中，我们引入了GLAD，这是一种新的半监督学习（SSL）方法，用于结合独立的注释数据集和未注释数据集，目的是识别更强大的样本分类器。在我们的方法中，使用注释和未注释数据集的基因子集开发独立模型。根据一个评分函数对这些模型进行评估，该评分函数包含注释数据上的分类准确性和未注释数据中的相对聚类分离项。使用遗传算法特征选择技术迭代生成改进模型。我们的结果表明，在训练中加入未注释数据可显著提高分类器的稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca3c/2559897/a78cbf2d7ae6/1471-2164-9-S2-S7-1.jpg

相似文献

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.使用半监督学习在有注释和无注释的微阵列数据集中发现生物标志物。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7.

Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data.通过整合研究间的微阵列数据直接鉴定出稳健的前列腺癌标志物基因。

Bioinformatics. 2005 Oct 15;21(20):3905-11. doi: 10.1093/bioinformatics/bti647. Epub 2005 Aug 30.

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Induction of comprehensible models for gene expression datasets by subgroup discovery methodology.通过子群发现方法为基因表达数据集诱导可理解模型。

J Biomed Inform. 2004 Aug;37(4):269-84. doi: 10.1016/j.jbi.2004.07.007.

A stable iterative method for refining discriminative gene clusters.一种用于优化鉴别性基因簇的稳定迭代方法。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S18. doi: 10.1186/1471-2164-9-S2-S18.

Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类

Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

Small, fuzzy and interpretable gene expression based classifiers.基于小的、模糊且可解释的基因表达的分类器。

Bioinformatics. 2005 May 1;21(9):1964-70. doi: 10.1093/bioinformatics/bti287. Epub 2005 Jan 20.

Robust and efficient identification of biomarkers by classifying features on graphs.通过对图上的特征进行分类实现稳健且高效的生物标志物识别。

Bioinformatics. 2008 Sep 15;24(18):2023-9. doi: 10.1093/bioinformatics/btn383. Epub 2008 Jul 24.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择：与支持向量机递归特征消除法的比较

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

Filter versus wrapper gene selection approaches in DNA microarray domains.DNA微阵列领域中过滤法与包装法基因选择方法

Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.

引用本文的文献

Accounting for control mislabeling in case-control biomarker studies.病例对照生物标志物研究中对照误分类的处理。

J Proteome Res. 2011 Dec 2;10(12):5562-7. doi: 10.1021/pr200507b. Epub 2011 Nov 8.

Genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research.基因组学、分子成像、生物信息学以及生物纳米信息整合是转化医学和个性化医疗研究的协同组成部分。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):I1. doi: 10.1186/1471-2164-9-S2-I1.

本文引用的文献

Semi-supervised analysis of gene expression profiles for lineage-specific development in the Caenorhabditis elegans embryo.秀丽隐杆线虫胚胎中谱系特异性发育的基因表达谱半监督分析。

Bioinformatics. 2006 Jul 15;22(14):e417-23. doi: 10.1093/bioinformatics/btl256.

Gene expression signature of primary imatinib-resistant chronic myeloid leukemia patients.

Leukemia. 2006 Aug;20(8):1400-7. doi: 10.1038/sj.leu.2404270. Epub 2006 May 25.

Molecular profiling of CD34+ cells identifies low expression of CD7, along with high expression of proteinase 3 or elastase, as predictors of longer survival in patients with CML.CD34+细胞的分子谱分析表明，CD7低表达以及蛋白酶3或弹性蛋白酶高表达可作为慢性粒细胞白血病患者生存期更长的预测指标。

Blood. 2006 Jan 1;107(1):205-12. doi: 10.1182/blood-2005-05-2155. Epub 2005 Sep 6.

The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma.纵隔大B细胞淋巴瘤的分子特征不同于其他弥漫性大B细胞淋巴瘤，且与经典型霍奇金淋巴瘤具有共同特征。

Blood. 2003 Dec 1;102(12):3871-9. doi: 10.1182/blood-2003-06-1841. Epub 2003 Aug 21.

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.通过基因表达谱分析和监督式机器学习预测弥漫性大B细胞淋巴瘤的预后

Nat Med. 2002 Jan;8(1):68-74. doi: 10.1038/nm0102-68.

MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia.MLL易位确定了一种独特的基因表达谱，该谱区分出一种独特的白血病。

Nat Genet. 2002 Jan;30(1):41-7. doi: 10.1038/ng765. Epub 2001 Dec 3.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类：通过基因表达监测进行类别发现和类别预测。

Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.

Cluster analysis and display of genome-wide expression patterns.全基因组表达模式的聚类分析与展示

Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8. doi: 10.1073/pnas.95.25.14863.

Quantitative monitoring of gene expression patterns with a complementary DNA microarray.利用互补DNA微阵列对基因表达模式进行定量监测。

Science. 1995 Oct 20;270(5235):467-70. doi: 10.1126/science.270.5235.467.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用半监督学习在有注释和无注释的微阵列数据集中发现生物标志物。

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献