• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用生物学知识进行无监督基因选择:在样本聚类中的应用

Unsupervised gene selection using biological knowledge : application in sample clustering.

作者信息

Acharya Sudipta, Saha Sriparna, Nikhil N

机构信息

IIT Patna, Department of Computer Science and engineering, Patna, India.

IIT Ropar, Department of Computer Science and engineering, Punjab, India.

出版信息

BMC Bioinformatics. 2017 Nov 22;18(1):513. doi: 10.1186/s12859-017-1933-0.

DOI:10.1186/s12859-017-1933-0
PMID:29166852
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5700545/
Abstract

BACKGROUND

Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role.

RESULTS

The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space.

CONCLUSIONS

Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques.

摘要

背景

基因表达数据的生物样本分类是解决生物信息学领域中诸如癌症和其他疾病诊断以及制定适当治疗方案等若干问题的基本组成部分。样本分类中的一个重大挑战是处理高维和冗余的基因表达数据。为了降低处理这种高维数据的复杂性,基因/特征选择起着主要作用。

结果

本文探讨了利用从基因本体数据库中获取的生物学知识来选择合适的基因子集,这些基因子集可进一步参与样本聚类。所提出的特征选择技术本质上是无监督的,因为它在基因选择过程中不使用任何类别标签信息。最后,采用多目标聚类方法在降维后的基因空间中对可用样本集进行聚类。

结论

报告结果表明,在基因选择技术中考虑生物学知识不仅能在很大程度上降低特征空间维度,还能提高样本分类的准确性。使用具有强大生物学意义的测试对获得的降维基因空间进行了验证。为了证明我们提出的基于基因选择的样本聚类技术的优越性,还与现有技术进行了全面的比较分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/4ac1af47fb99/12859_2017_1933_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/66e04896a311/12859_2017_1933_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/146e14968029/12859_2017_1933_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/ce4b0570710b/12859_2017_1933_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/a2db8a954ca9/12859_2017_1933_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/c590869eb132/12859_2017_1933_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/4ac1af47fb99/12859_2017_1933_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/66e04896a311/12859_2017_1933_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/146e14968029/12859_2017_1933_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/ce4b0570710b/12859_2017_1933_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/a2db8a954ca9/12859_2017_1933_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/c590869eb132/12859_2017_1933_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d771/5700545/4ac1af47fb99/12859_2017_1933_Fig6_HTML.jpg

相似文献

1
Unsupervised gene selection using biological knowledge : application in sample clustering.利用生物学知识进行无监督基因选择:在样本聚类中的应用
BMC Bioinformatics. 2017 Nov 22;18(1):513. doi: 10.1186/s12859-017-1933-0.
2
A consensus multi-view multi-objective gene selection approach for improved sample classification.一种共识多视角多目标基因选择方法,用于提高样本分类。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):386. doi: 10.1186/s12859-020-03681-5.
3
Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering.基于新型对称的基因-基因相异度度量方法,并利用基因本体论:在基因聚类中的应用。
Gene. 2018 Dec 30;679:341-351. doi: 10.1016/j.gene.2018.08.062. Epub 2018 Sep 2.
4
Multi-view feature selection for identifying gene markers: a diversified biological data driven approach.多视角特征选择用于鉴定基因标志物:一种多样化的生物数据驱动方法。
BMC Bioinformatics. 2020 Dec 30;21(Suppl 18):483. doi: 10.1186/s12859-020-03810-0.
5
Multi-Factored Gene-Gene Proximity Measures Exploiting Biological Knowledge Extracted from Gene Ontology: Application in Gene Clustering.多因素基因-基因邻近度度量方法,利用从基因本体论中提取的生物学知识:在基因聚类中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):207-219. doi: 10.1109/TCBB.2018.2849362. Epub 2018 Jun 21.
6
Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles.基于双重选择的半监督聚类集成用于从基因表达谱中进行肿瘤聚类
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):727-40. doi: 10.1109/TCBB.2014.2315996.
7
Statistical approach for selection of biologically informative genes.用于选择具有生物学信息基因的统计方法。
Gene. 2018 May 20;655:71-83. doi: 10.1016/j.gene.2018.02.044. Epub 2018 Feb 16.
8
A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.利用基因本体论进行基因表达分类的特征选择方法分析的仿真。
J Biomed Inform. 2013 Dec;46(6):1044-59. doi: 10.1016/j.jbi.2013.07.008. Epub 2013 Jul 25.
9
Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data.将基因本体论纳入微阵列基因表达数据的模糊关系聚类中。
Biosystems. 2018 Jan;163:1-10. doi: 10.1016/j.biosystems.2017.09.017. Epub 2017 Nov 4.
10
Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes.使用半监督聚类和特征选择技术识别共表达基因。
IEEE J Biomed Health Inform. 2016 Jul;20(4):1171-7. doi: 10.1109/JBHI.2015.2451735. Epub 2015 Jul 20.

引用本文的文献

1
Biologically weighted LASSO: enhancing functional interpretability in gene expression data analysis.基于生物学权重的 LASSO 模型:提升基因表达数据分析中功能可解释性。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae605.
2
CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis.CogNet:基于面向排名活性子网的KEGG通路富集分析的基因表达数据分类
PeerJ Comput Sci. 2021 Feb 22;7:e336. doi: 10.7717/peerj-cs.336. eCollection 2021.
3
Multi-view feature selection for identifying gene markers: a diversified biological data driven approach.

本文引用的文献

1
Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes.使用半监督聚类和特征选择技术识别共表达基因。
IEEE J Biomed Health Inform. 2016 Jul;20(4):1171-7. doi: 10.1109/JBHI.2015.2451735. Epub 2015 Jul 20.
2
City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs.用于识别共表达微小RNA的城市街区距离和粗糙模糊聚类
Mol Biosyst. 2014 Jun;10(6):1509-23. doi: 10.1039/c4mb00101j. Epub 2014 Mar 31.
3
Measuring gene functional similarity based on group-wise comparison of GO terms.
多视角特征选择用于鉴定基因标志物:一种多样化的生物数据驱动方法。
BMC Bioinformatics. 2020 Dec 30;21(Suppl 18):483. doi: 10.1186/s12859-020-03810-0.
4
Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data.基于生物领域知识的特征选择在基因表达数据中的应用。
Entropy (Basel). 2020 Dec 22;23(1):2. doi: 10.3390/e23010002.
5
Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions.基于机器学习的计算基因选择模型:综述、性能评估、开放问题及未来研究方向
Front Genet. 2020 Dec 10;11:603808. doi: 10.3389/fgene.2020.603808. eCollection 2020.
6
A consensus multi-view multi-objective gene selection approach for improved sample classification.一种共识多视角多目标基因选择方法,用于提高样本分类。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):386. doi: 10.1186/s12859-020-03681-5.
7
Comparative Transcriptomics of the Bovine Apicomplexan Parasite Developmental Stages Reveals Massive Gene Expression Variation and Potential Vaccine Antigens.牛顶复门寄生虫发育阶段的比较转录组学揭示了大量基因表达变异和潜在疫苗抗原
Front Vet Sci. 2020 Jun 9;7:287. doi: 10.3389/fvets.2020.00287. eCollection 2020.
8
A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering.基于蛋白质相互作用信息的基因聚类增强生成模型。
Sci Rep. 2020 Jan 20;10(1):665. doi: 10.1038/s41598-020-57437-5.
9
Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches.基因表达数据的整合基因选择:为传统方法提供生物学背景。
J Integr Bioinform. 2018 Dec 22;16(1):20180064. doi: 10.1515/jib-2018-0064.
基于 GO 术语的组间比较来衡量基因功能相似性。
Bioinformatics. 2013 Jun 1;29(11):1424-32. doi: 10.1093/bioinformatics/btt160. Epub 2013 Apr 9.
4
Semantic similarity analysis of protein data: assessment with biological features and issues.蛋白质数据的语义相似性分析:生物特征和问题的评估。
Brief Bioinform. 2012 Sep;13(5):569-85. doi: 10.1093/bib/bbr066. Epub 2011 Dec 2.
5
A cluster separation measure.一种聚类分离度量。
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.
6
An efficient statistical feature selection approach for classification of gene expression data.一种用于基因表达数据分类的高效统计特征选择方法。
J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.
7
Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.MAQC-II 乳腺癌和多发性骨髓瘤基因表达数据的特征选择和分类。
PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.
8
Clustering cancer gene expression data: a comparative study.癌症基因表达数据聚类:一项比较研究。
BMC Bioinformatics. 2008 Nov 27;9:497. doi: 10.1186/1471-2105-9-497.
9
Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems.利用一种针对多类问题的信噪比新推广方法从表达数据中发现显性和隐性基因。
BMC Bioinformatics. 2008 Oct 9;9:425. doi: 10.1186/1471-2105-9-425.
10
Some new indexes of cluster validity.一些新的聚类有效性指标。
IEEE Trans Syst Man Cybern B Cybern. 1998;28(3):301-15. doi: 10.1109/3477.678624.