• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于博弈论的冗余感知无监督排序:基因集集合中路径的排序。

Redundancy-aware unsupervised ranking based on game theory: Ranking pathways in collections of gene sets.

机构信息

Department of Computer Science, TU Dortmund, Dortmund, Germany.

Department of Medical Biometry, Informatics and Epidemiology (IMBIE), University Hospital Bonn, Bonn, Germany.

出版信息

PLoS One. 2023 Mar 9;18(3):e0282699. doi: 10.1371/journal.pone.0282699. eCollection 2023.

DOI:10.1371/journal.pone.0282699
PMID:36893181
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9997904/
Abstract

In Genetics, gene sets are grouped in collections concerning their biological function. This often leads to high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation of their biological meaning. In Data Mining, it is often argued that techniques to reduce the dimensionality of data could increase the maneuverability and consequently the interpretability of large data. In the past years, moreover, we witnessed an increasing consciousness of the importance of understanding data and interpretable models in the machine learning and bioinformatics communities. On the one hand, there exist techniques aiming to aggregate overlapping gene sets to create larger pathways. While these methods could partly solve the large size of the collections' problem, modifying biological pathways is hardly justifiable in this biological context. On the other hand, the representation methods to increase interpretability of collections of gene sets that have been proposed so far have proved to be insufficient. Inspired by this Bioinformatics context, we propose a method to rank sets within a family of sets based on the distribution of the singletons and their size. We obtain sets' importance scores by computing Shapley values; Making use of microarray games, we do not incur the typical exponential computational complexity. Moreover, we address the challenge of constructing redundancy-aware rankings where, in our case, redundancy is a quantity proportional to the size of intersections among the sets in the collections. We use the obtained rankings to reduce the dimension of the families, therefore showing lower redundancy among sets while still preserving a high coverage of their elements. We finally evaluate our approach for collections of gene sets and apply Gene Sets Enrichment Analysis techniques to the now smaller collections: As expected, the unsupervised nature of the proposed rankings allows for unremarkable differences in the number of significant gene sets for specific phenotypic traits. In contrast, the number of performed statistical tests can be drastically reduced. The proposed rankings show a practical utility in bioinformatics to increase interpretability of the collections of gene sets and a step forward to include redundancy-awareness into Shapley values computations.

摘要

在遗传学中,基因集根据其生物学功能被分组为集合。这通常会导致高维、重叠和冗余的集合家族,从而排除了对其生物学意义的直接解释。在数据挖掘中,人们经常认为,降低数据维度的技术可以提高数据的可操作性,从而提高大规模数据的可解释性。此外,在过去的几年中,我们见证了机器学习和生物信息学社区中对理解数据和可解释模型的重要性的认识不断提高。一方面,存在旨在聚合重叠基因集以创建更大途径的技术。虽然这些方法可以部分解决集合规模过大的问题,但在这种生物学背景下,修改生物学途径是难以证明合理的。另一方面,迄今为止提出的提高基因集集合可解释性的表示方法被证明是不够的。受这个生物信息学背景的启发,我们提出了一种基于单元素及其大小分布对集合家族内的集合进行排序的方法。我们通过计算 Shapley 值来获得集合的重要性得分;利用微阵列游戏,我们不会遇到典型的指数级计算复杂度。此外,我们解决了构建冗余感知排名的挑战,在我们的案例中,冗余是与集合在集合中的交集大小成正比的数量。我们利用获得的排名来降低集合家族的维度,从而在保持元素高覆盖率的同时降低集合之间的冗余度。我们最后评估了我们的方法在基因集集合中的应用,并将基因集富集分析技术应用于现在较小的集合中:正如预期的那样,所提出的排名的无监督性质允许对特定表型特征的显著基因集数量没有明显差异。相比之下,可以大大减少执行的统计测试数量。所提出的排名在生物信息学中具有实用价值,可以提高基因集集合的可解释性,并朝着将冗余意识纳入 Shapley 值计算的方向迈出了一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/b1c814869bdf/pone.0282699.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/1f2d2b045ec7/pone.0282699.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/c5a027635f63/pone.0282699.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/2aea2e06e61f/pone.0282699.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/b1c814869bdf/pone.0282699.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/1f2d2b045ec7/pone.0282699.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/c5a027635f63/pone.0282699.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/2aea2e06e61f/pone.0282699.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd75/9997904/b1c814869bdf/pone.0282699.g004.jpg

相似文献

1
Redundancy-aware unsupervised ranking based on game theory: Ranking pathways in collections of gene sets.基于博弈论的冗余感知无监督排序:基因集集合中路径的排序。
PLoS One. 2023 Mar 9;18(3):e0282699. doi: 10.1371/journal.pone.0282699. eCollection 2023.
2
Using set theory to reduce redundancy in pathway sets.运用集合论减少通路集的冗余。
BMC Bioinformatics. 2018 Oct 19;19(1):386. doi: 10.1186/s12859-018-2355-3.
3
CAFÉ-Map: Context Aware Feature Mapping for mining high dimensional biomedical data.CAFÉ-Map:用于挖掘高维生物医学数据的上下文感知特征映射。
Comput Biol Med. 2016 Dec 1;79:68-79. doi: 10.1016/j.compbiomed.2016.10.006. Epub 2016 Oct 11.
4
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
5
6
Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.临床神经科学中的功能基因组学和蛋白质组学:数据挖掘与生物信息学
Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5.
7
Explaining multivariate molecular diagnostic tests via Shapley values.通过 Shapley 值解释多变量分子诊断测试。
BMC Med Inform Decis Mak. 2021 Jul 8;21(1):211. doi: 10.1186/s12911-021-01569-9.
8
Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.基于合作博弈论和定性互信息的微阵列数据稳健特征选择
Adv Bioinformatics. 2016;2016:1058305. doi: 10.1155/2016/1058305. Epub 2016 Mar 20.
9
LineUp: visual analysis of multi-attribute rankings.LineUp:多属性排名的可视化分析。
IEEE Trans Vis Comput Graph. 2013 Dec;19(12):2277-86. doi: 10.1109/TVCG.2013.173.
10
The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction.治愈方法:用于乳腺癌生存预测的基因选择的众包游戏的设计和评估。
JMIR Serious Games. 2014 Jul 29;2(2):e7. doi: 10.2196/games.3350.

引用本文的文献

1
Pathway Analysis Interpretation in the Multi-Omic Era.多组学时代的通路分析解读
BioTech (Basel). 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058.

本文引用的文献

1
Gene Set Knowledge Discovery with Enrichr.基因集知识发现与 Enrichr
Curr Protoc. 2021 Mar;1(3):e90. doi: 10.1002/cpz1.90.
2
Game theoretic centrality: a novel approach to prioritize disease candidate genes by combining biological networks with the Shapley value.博弈论中心性:一种将生物网络与 Shapley 值相结合以优先考虑疾病候选基因的新方法。
BMC Bioinformatics. 2020 Aug 12;21(1):356. doi: 10.1186/s12859-020-03693-1.
3
The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling.通路数据库选择对统计富集分析和预测建模的影响。
Front Genet. 2019 Nov 22;10:1203. doi: 10.3389/fgene.2019.01203. eCollection 2019.
4
Using set theory to reduce redundancy in pathway sets.运用集合论减少通路集的冗余。
BMC Bioinformatics. 2018 Oct 19;19(1):386. doi: 10.1186/s12859-018-2355-3.
5
Gene set analysis methods: a systematic comparison.基因集分析方法:系统比较
BioData Min. 2018 May 31;11:8. doi: 10.1186/s13040-018-0166-8. eCollection 2018.
6
Unsupervised gene set testing based on random matrix theory.基于随机矩阵理论的无监督基因集测试
BMC Bioinformatics. 2016 Nov 4;17(1):442. doi: 10.1186/s12859-016-1299-8.
7
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.Enrichr:一个全面的基因集富集分析网络服务器2016年更新版。
Nucleic Acids Res. 2016 Jul 8;44(W1):W90-7. doi: 10.1093/nar/gkw377. Epub 2016 May 3.
8
The Molecular Signatures Database (MSigDB) hallmark gene set collection.分子特征数据库(MSigDB)标志性基因集集合。
Cell Syst. 2015 Dec 23;1(6):417-425. doi: 10.1016/j.cels.2015.12.004.
9
PathCards: multi-source consolidation of human biological pathways.PathCards:人类生物通路的多源整合
Database (Oxford). 2015 Feb 27;2015. doi: 10.1093/database/bav006. Print 2015.
10
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.Enrichr:交互式协作 HTML5 基因列表富集分析工具。
BMC Bioinformatics. 2013 Apr 15;14:128. doi: 10.1186/1471-2105-14-128.