Suppr超能文献

使用半监督学习在有注释和无注释的微阵列数据集中发现生物标志物。

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.

作者信息

Harris Cole, Ghaffari Noushin

机构信息

Exagen Diagnostics, Inc, Houston, TX, USA.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7.

Abstract

The growing body of DNA microarray data has the potential to advance our understanding of the molecular basis of disease. However annotating microarray datasets with clinically useful information is not always possible, as this often requires access to detailed patient records. In this study we introduce GLAD, a new Semi-Supervised Learning (SSL) method for combining independent annotated datasets and unannotated datasets with the aim of identifying more robust sample classifiers. In our method, independent models are developed using subsets of genes for the annotated and unannotated datasets. These models are evaluated according to a scoring function that incorporates terms for classification accuracy on annotated data, and relative cluster separation in unannotated data. Improved models are iteratively generated using a genetic algorithm feature selection technique. Our results show that the addition of unannotated data into training, significantly improves classifier robustness.

摘要

越来越多的DNA微阵列数据有潜力促进我们对疾病分子基础的理解。然而,用临床有用信息注释微阵列数据集并非总是可行的,因为这通常需要获取详细的患者记录。在本研究中,我们引入了GLAD,这是一种新的半监督学习(SSL)方法,用于结合独立的注释数据集和未注释数据集,目的是识别更强大的样本分类器。在我们的方法中,使用注释和未注释数据集的基因子集开发独立模型。根据一个评分函数对这些模型进行评估,该评分函数包含注释数据上的分类准确性和未注释数据中的相对聚类分离项。使用遗传算法特征选择技术迭代生成改进模型。我们的结果表明,在训练中加入未注释数据可显著提高分类器的稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca3c/2559897/a78cbf2d7ae6/1471-2164-9-S2-S7-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验