• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

新型基因集改进了原核生物基因表达数据的集水平分类。

Novel gene sets improve set-level classification of prokaryotic gene expression data.

作者信息

Holec Matěj, Kuželka Ondřej, Železný Filip

机构信息

Faculty of Electrical Engineering, Czech Technical University, Technická 2, Prague, 166 27, Czech Republic.

School of Computer Science and Informatics, Cardiff University, Queen's Buildings, 5 The Parade, Roath, Cardiff, CF24 3AA, UK.

出版信息

BMC Bioinformatics. 2015 Oct 28;16:348. doi: 10.1186/s12859-015-0786-7.

DOI:10.1186/s12859-015-0786-7
PMID:26511329
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4625461/
Abstract

BACKGROUND

Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers.

METHODS

We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach.

RESULTS

The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers.

CONCLUSION

Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.

摘要

背景

基因表达数据的集水平分类近来受到了广泛关注。在这种情况下,与基因相对应的高维特征向量被转换为与具有生物学可解释性的基因集相对应的低维特征向量。降维有望降低过拟合风险,从而可能提高所学习分类器的准确性。然而,最近的实证研究并未证实这一预期。在此,我们假设在集水平框架中报告的不利分类结果是由于采用了通常基于基因本体和代谢网络的KEGG数据库定义的不合适基因集。我们探索了一种基于调控相互作用来定义基因集的替代方法,我们期望这种方法能收集到表达更相关的基因。我们假设这样更相关的基因集将能够学习到更准确的分类器。

方法

我们利用调控相互作用信息定义了两个基因集家族,并使用公开的原核生物基因表达数据集对它们进行表型分类任务评估。从这两个基因集家族中,我们首先选择表现最佳的亚型。然后,在独立(测试)数据集上,将这两个选定的亚型与最先进的基因集以及传统的基因水平方法进行评估。

结果

新的基因集确实比传统基因集更具相关性,并能带来显著更准确的分类器。新的基因集确实比传统基因集更具相关性,并能带来显著更准确的分类器。

结论

基于调控相互作用定义的新基因集改善了基因表达数据的集水平分类。可通过http://ida.felk.cvut.cz/novelgenesets.tar.gz获取重现实验所需的实验脚本和其他材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b1b/4625461/86d30cedabd9/12859_2015_786_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b1b/4625461/64d8f9647bf0/12859_2015_786_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b1b/4625461/15cbc31e2af1/12859_2015_786_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b1b/4625461/86d30cedabd9/12859_2015_786_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b1b/4625461/64d8f9647bf0/12859_2015_786_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b1b/4625461/15cbc31e2af1/12859_2015_786_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b1b/4625461/86d30cedabd9/12859_2015_786_Fig3_HTML.jpg

相似文献

1
Novel gene sets improve set-level classification of prokaryotic gene expression data.新型基因集改进了原核生物基因表达数据的集水平分类。
BMC Bioinformatics. 2015 Oct 28;16:348. doi: 10.1186/s12859-015-0786-7.
2
Comparative evaluation of set-level techniques in predictive classification of gene expression samples.基于集合水平的技术在基因表达样本预测分类中的比较评估。
BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S15. doi: 10.1186/1471-2105-13-S10-S15.
3
An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features.一种利用通量耦合特征改进大肠杆菌代谢中必需基因预测的综合机器学习策略。
Mol Biosyst. 2017 Jul 25;13(8):1584-1596. doi: 10.1039/c7mb00234c.
4
A blocking strategy to improve gene selection for classification of gene expression data.一种用于改进基因选择以对基因表达数据进行分类的阻断策略。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Apr-Jun;4(2):293-300. doi: 10.1109/TCBB.2007.1014.
5
LEMRG: Decision Rule Generation Algorithm for Mining MicroRNA Expression Data.LEMRG:用于挖掘微小RNA表达数据的决策规则生成算法
Adv Exp Med Biol. 2017;1028:105-137. doi: 10.1007/978-981-10-6041-0_7.
6
Cancer survival classification using integrated data sets and intermediate information.基于整合数据集和中间信息的癌症生存分类。
Artif Intell Med. 2014 Sep;62(1):23-31. doi: 10.1016/j.artmed.2014.06.003. Epub 2014 Jun 21.
7
Predicting genetic regulatory response using classification.使用分类方法预测基因调控反应。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.
8
Network-constrained forest for regularized classification of omics data.用于组学数据正则化分类的网络约束森林
Methods. 2015 Jul 15;83:88-97. doi: 10.1016/j.ymeth.2015.04.006. Epub 2015 Apr 11.
9
Random forests-based differential analysis of gene sets for gene expression data.基于随机森林的基因表达数据基因集差异分析。
Gene. 2013 Apr 10;518(1):179-86. doi: 10.1016/j.gene.2012.11.034. Epub 2012 Dec 6.
10
Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。
Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

本文引用的文献

1
Network-constrained forest for regularized classification of omics data.用于组学数据正则化分类的网络约束森林
Methods. 2015 Jul 15;83:88-97. doi: 10.1016/j.ymeth.2015.04.006. Epub 2015 Apr 11.
2
Genome-wide miRNA profiling in myelodysplastic syndrome with del(5q) treated with lenalidomide.来那度胺治疗的伴有5号染色体长臂缺失(del(5q))的骨髓增生异常综合征的全基因组miRNA分析
Eur J Haematol. 2015 Jul;95(1):35-43. doi: 10.1111/ejh.12458. Epub 2014 Nov 11.
3
Genotoxicity but not the AhR-mediated activity of PAHs is inhibited by other components of complex mixtures of ambient air pollutants.
多环芳烃的遗传毒性而非 AhR 介导的活性可被大气污染物复合混合物的其他成分所抑制。
Toxicol Lett. 2014 Mar 21;225(3):350-7. doi: 10.1016/j.toxlet.2014.01.028. Epub 2014 Jan 26.
4
Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis.目前的复合特征分类方法在乳腺癌预后方面并不优于简单的单一基因分类器。
Front Genet. 2013 Dec 23;4:289. doi: 10.3389/fgene.2013.00289. eCollection 2013.
5
RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.RegulonDB v8.0:组学数据集、进化保守性、调控短语、交叉验证的黄金标准等。
Nucleic Acids Res. 2013 Jan;41(Database issue):D203-13. doi: 10.1093/nar/gks1201. Epub 2012 Nov 29.
6
NCBI GEO: archive for functional genomics data sets--update.NCBI GEO:功能基因组学数据集存档 - 更新。
Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5. doi: 10.1093/nar/gks1193. Epub 2012 Nov 27.
7
Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.评估用于分析细菌基因表达数据的基因集的一致性。
BMC Bioinformatics. 2012 Aug 8;13:193. doi: 10.1186/1471-2105-13-193.
8
Comparative evaluation of set-level techniques in predictive classification of gene expression samples.基于集合水平的技术在基因表达样本预测分类中的比较评估。
BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S15. doi: 10.1186/1471-2105-13-S10-S15.
9
A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer.基于网络和途径的分类器在乳腺癌预后预测中的评价。
PLoS One. 2012;7(4):e34796. doi: 10.1371/journal.pone.0034796. Epub 2012 Apr 27.
10
Empirical evidence of the applicability of functional clustering through gene expression classification.通过基因表达分类对功能聚类适用性的实证证据。
IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):788-98. doi: 10.1109/TCBB.2012.23.