• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DNA微阵列领域中过滤法与包装法基因选择方法

Filter versus wrapper gene selection approaches in DNA microarray domains.

作者信息

Inza Iñaki, Larrañaga Pedro, Blanco Rosa, Cerrolaza Antonio J

机构信息

Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastián, Basque Country, Spain.

出版信息

Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.

DOI:10.1016/j.artmed.2004.01.007
PMID:15219288
Abstract

DNA microarray experiments generating thousands of gene expression measurements, are used to collect information from tissue and cell samples regarding gene expression differences that could be useful for diagnosis disease, distinction of the specific tumor type, etc. One important application of gene expression microarray data is the classification of samples into known categories. As DNA microarray technology measures the gene expression en masse, this has resulted in data with the number of features (genes) far exceeding the number of samples. As the predictive accuracy of supervised classifiers that try to discriminate between the classes of the problem decays with the existence of irrelevant and redundant features, the necessity of a dimensionality reduction process is essential. We propose the application of a gene selection process, which also enables the biology researcher to focus on promising gene candidates that actively contribute to classification in these large scale microarrays. Two basic approaches for feature selection appear in machine learning and pattern recognition literature: the filter and wrapper techniques. Filter procedures are used in most of the works in the area of DNA microarrays. In this work, a comparison between a group of different filter metrics and a wrapper sequential search procedure is carried out. The comparison is performed in two well-known DNA microarray datasets by the use of four classic supervised classifiers. The study is carried out over the original-continuous and three-intervals discretized gene expression data. While two well-known filter metrics are proposed for continuous data, four classic filter measures are used over discretized data. The same wrapper approach is used for both continuous and discretized data. The application of filter and wrapper gene selection procedures leads to considerably better accuracy results in comparison to the non-gene selection approach, coupled with interesting and notable dimensionality reductions. Although the wrapper approach mainly shows a more accurate behavior than filter metrics, this improvement is coupled with considerable computer-load necessities. We note that most of the genes selected by proposed filter and wrapper procedures in discrete and continuous microarray data appear in the lists of relevant-informative genes detected by previous studies over these datasets. The aim of this work is to make contributions in the field of the gene selection task in DNA microarray datasets. By an extensive comparison with more popular filter techniques, we would like to make contributions in the expansion and study of the wrapper approach in this type of domains.

摘要

DNA微阵列实验可生成数千个基因表达测量值,用于从组织和细胞样本中收集有关基因表达差异的信息,这些差异可能有助于疾病诊断、特定肿瘤类型的区分等。基因表达微阵列数据的一个重要应用是将样本分类到已知类别中。由于DNA微阵列技术可大规模测量基因表达,这导致数据的特征数量(基因)远远超过样本数量。由于试图区分问题类别的监督分类器的预测准确性会随着无关和冗余特征的存在而下降,因此降维过程是必不可少的。我们提出应用基因选择过程,这也使生物学研究人员能够专注于在这些大规模微阵列中对分类有积极贡献的有前景的基因候选物。机器学习和模式识别文献中出现了两种基本的特征选择方法:过滤和包装技术。过滤程序在DNA微阵列领域的大多数工作中都有使用。在这项工作中,对一组不同的过滤指标和一个包装顺序搜索程序进行了比较。通过使用四个经典的监督分类器,在两个著名的DNA微阵列数据集上进行了比较。该研究是针对原始连续和三个区间离散化后的基因表达数据进行的。对于连续数据,提出了两个著名的过滤指标,而对于离散化数据,则使用了四个经典的过滤措施。连续和离散化数据都使用相同的包装方法。与非基因选择方法相比,过滤和包装基因选择程序的应用导致了显著更好的准确性结果,同时伴随着有趣且显著的降维。尽管包装方法主要表现出比过滤指标更准确的行为,但这种改进伴随着相当大的计算机负载需求。我们注意到,在离散和连续微阵列数据中,通过提出的过滤和包装程序选择的大多数基因都出现在先前对这些数据集的研究检测到的相关信息基因列表中。这项工作的目的是在DNA微阵列数据集的基因选择任务领域做出贡献。通过与更流行的过滤技术进行广泛比较,我们希望在这类领域中对包装方法的扩展和研究做出贡献。

相似文献

1
Filter versus wrapper gene selection approaches in DNA microarray domains.DNA微阵列领域中过滤法与包装法基因选择方法
Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.
2
A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
3
Wrapper filtering criteria via linear neuron and kernel approaches.通过线性神经元和核方法的包装器过滤标准。
Comput Biol Med. 2008 Aug;38(8):894-912. doi: 10.1016/j.compbiomed.2008.05.005. Epub 2008 Jul 24.
4
Gene selection from microarray data for cancer classification--a machine learning approach.基于机器学习方法从微阵列数据中进行癌症分类的基因选择
Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.
5
Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.
6
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类
Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.
7
The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms.基于计数的误差估计器导致的关联问题及其对基因选择算法的影响。
Bioinformatics. 2006 Oct 15;22(20):2507-15. doi: 10.1093/bioinformatics/btl438. Epub 2006 Aug 14.
8
A GMM-IG framework for selecting genes as expression panel biomarkers.一种用于选择基因作为表达谱生物标志物的 GMM-IG 框架。
Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.
9
Classification of microarray data with factor mixture models.基于因子混合模型的微阵列数据分类
Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.
10
Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers.从微阵列数据中选择最少数量的相关基因以设计精确的组织分类器。
Biosystems. 2007 Jul-Aug;90(1):78-86. doi: 10.1016/j.biosystems.2006.07.002. Epub 2006 Jul 10.

引用本文的文献

1
Variable Selection for Multivariate Failure Time Data via Regularized Sparse-Input Neural Network.基于正则化稀疏输入神经网络的多变量失效时间数据变量选择
Bioengineering (Basel). 2025 May 31;12(6):596. doi: 10.3390/bioengineering12060596.
2
Machine learning prediction of malaria vaccine efficacy based on antibody profiles.基于抗体谱的疟疾疫苗效力的机器学习预测。
PLoS Comput Biol. 2024 Jun 7;20(6):e1012131. doi: 10.1371/journal.pcbi.1012131. eCollection 2024 Jun.
3
Antibody selection strategies and their impact in predicting clinical malaria based on multi-sera data.
基于多份血清数据的抗体选择策略及其在预测临床疟疾方面的影响。
BioData Min. 2024 Jan 25;17(1):2. doi: 10.1186/s13040-024-00354-4.
4
On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data.基于 QDE-SVM 的 scRNA-seq 数据基因特征选择和细胞类型分类方法。
PLoS One. 2023 Oct 19;18(10):e0292961. doi: 10.1371/journal.pone.0292961. eCollection 2023.
5
GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning.基因本体论:通过利用基于生物知识的机器学习对基因表达数据进行分组、评分和建模来识别受影响的基因本体术语。
Front Genet. 2023 Aug 21;14:1139082. doi: 10.3389/fgene.2023.1139082. eCollection 2023.
6
Control of blood capillary networks and holes in blood-brain barrier models by regulating elastic modulus of scaffolds.通过调节支架的弹性模量来控制血脑屏障模型中的毛细血管网络和孔洞
Mater Today Bio. 2023 Jun 28;21:100714. doi: 10.1016/j.mtbio.2023.100714. eCollection 2023 Aug.
7
Optimizing prognostic factors of five-year survival in gastric cancer patients using feature selection techniques with machine learning algorithms: a comparative study.使用机器学习算法进行特征选择技术优化胃癌患者五年生存率的预后因素:一项比较研究。
BMC Med Inform Decis Mak. 2023 Apr 6;23(1):54. doi: 10.1186/s12911-023-02154-y.
8
Human activity recognition from sensor data using spatial attention-aided CNN with genetic algorithm.使用带有遗传算法的空间注意力辅助卷积神经网络从传感器数据中进行人类活动识别。
Neural Comput Appl. 2023;35(7):5165-5191. doi: 10.1007/s00521-022-07911-0. Epub 2022 Oct 26.
9
A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction.基于机器学习的疾病风险预测的特征选择方法综述
Front Bioinform. 2022 Jun 27;2:927312. doi: 10.3389/fbinf.2022.927312. eCollection 2022.
10
Particle swarm optimization artificial intelligence technique for gene signature discovery in transcriptomic cohorts.用于转录组队列中基因特征发现的粒子群优化人工智能技术
Comput Struct Biotechnol J. 2022 Sep 26;20:5547-5563. doi: 10.1016/j.csbj.2022.09.033. eCollection 2022.