• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

系统比较实验结果中基因列表的排名聚合方法。

Systematic comparison of ranking aggregation methods for gene lists in experimental results.

机构信息

Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.

University of Sheffield, Sheffield S10 2NT, UK.

出版信息

Bioinformatics. 2022 Oct 31;38(21):4927-4933. doi: 10.1093/bioinformatics/btac621.

DOI:10.1093/bioinformatics/btac621
PMID:36094347
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9620830/
Abstract

MOTIVATION

A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists.

RESULTS

In this study, a group of existing methods and their variations that are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data were used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level and a mix of unranked and ranked data using 20 000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (non-small cell lung cancer) and bacteria (macrophage apoptosis) was performed. We summarize the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content algorithm to infer heterogeneity of data quality across input datasets.

AVAILABILITY AND IMPLEMENTATION

The code for simulated data generation and running edited version of algorithms: https://github.com/baillielab/comparison_of_RA_methods. Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here: https://github.com/baillielab/maic. An online service for running MAIC: https://baillielab.net/maic.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物医学科学中的一个常见实验输出是与给定生物过程或疾病相关的基因列表。通过对回答相同或相似问题的一组研究进行排名聚合方法,可以将产生的基因列表进行组合,以找到共识或更可靠的答案。在使用排名聚合方法之前,需要针对特定类型的数据进行评估,以支持其可靠性,因为数据集的特性会影响算法的性能。由于缺乏真实数据的已知事实,因此通常基于模拟数据库对基因列表进行此类评估。然而,与实验数据相比,模拟数据集往往太小,并且忽略了关键特征,包括质量、相关性和未排名列表的异质性。

结果

在这项研究中,使用模拟和真实数据比较了一组适合基因列表荟萃分析的现有方法及其变体。使用模拟数据来探索聚合方法的性能,作为模拟真实基因组数据常见情况的函数,使用 20000 个可能实体模拟各种质量、噪声水平和混合未排名和排名数据的异质性。除了使用模拟数据进行评估外,还使用 SARS-CoV-2 病毒、癌症(非小细胞肺癌)和细菌(巨噬细胞凋亡)的真实基因组数据进行了比较。我们总结了评估结果,以流程图的形式选择排名聚合方法,并以使用信息内容荟萃分析算法推断输入数据集质量异质性的自动实现形式呈现。

可用性和实施

模拟数据生成和运行编辑版本算法的代码:https://github.com/baillielab/comparison_of_RA_methods。可在此处下载用于根据此评论的结果进行方法最佳选择的代码,使用 MAIC 算法推断输入数据集的特征:https://github.com/baillielab/maic。运行 MAIC 的在线服务:https://baillielab.net/maic。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2447/9620830/2b883164c4f6/btac621f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2447/9620830/0bf6d29875ae/btac621f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2447/9620830/c83e804ae57b/btac621f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2447/9620830/2b883164c4f6/btac621f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2447/9620830/0bf6d29875ae/btac621f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2447/9620830/c83e804ae57b/btac621f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2447/9620830/2b883164c4f6/btac621f3.jpg

相似文献

1
Systematic comparison of ranking aggregation methods for gene lists in experimental results.系统比较实验结果中基因列表的排名聚合方法。
Bioinformatics. 2022 Oct 31;38(21):4927-4933. doi: 10.1093/bioinformatics/btac621.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Dynamic data-driven meta-analysis for prioritisation of host genes implicated in COVID-19.基于动态数据驱动的荟萃分析对 COVID-19 相关宿主基因进行优先级排序。
Sci Rep. 2020 Dec 18;10(1):22303. doi: 10.1038/s41598-020-79033-3.
4
GeneRaMeN enables integration, comparison, and meta-analysis of multiple ranked gene lists to identify consensus, unique, and correlated genes.GeneRaMeN 能够整合、比较和荟萃分析多个排名基因列表,以识别共识、独特和相关的基因。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae452.
5
A comparative study of rank aggregation methods for partial and top ranked lists in genomic applications.基于基因组学应用的部分和顶级排名列表的等级聚合方法的比较研究。
Brief Bioinform. 2019 Jan 18;20(1):178-189. doi: 10.1093/bib/bbx101.
6
Systematic analysis of alternative splicing in time course data using Spycone.利用 Spycone 对时间序列数据中的可变剪接进行系统分析。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac846.
7
Mutual enrichment in aggregated ranked lists with applications to gene expression regulation.聚合排名列表中的相互富集及其在基因表达调控中的应用
Bioinformatics. 2016 Sep 1;32(17):i464-i472. doi: 10.1093/bioinformatics/btw435.
8
Systematic review and meta-analysis of genome-wide pooled CRISPR screens to identify host factors involved in influenza A virus infection.系统评价和荟萃分析全基因组 CRISPR 筛选以鉴定参与甲型流感病毒感染的宿主因素。
J Virol. 2024 May 14;98(5):e0185723. doi: 10.1128/jvi.01857-23. Epub 2024 Apr 3.
9
R2KS: a novel measure for comparing gene expression based on ranked gene lists.R2KS:一种基于排名基因列表比较基因表达的新方法。
J Comput Biol. 2012 Jun;19(6):766-75. doi: 10.1089/cmb.2012.0026.
10
Optimal Gene Filtering for Single-Cell data (OGFSC)-a gene filtering algorithm for single-cell RNA-seq data.单细胞数据最优基因过滤算法(OGFSC)——一种用于单细胞 RNA-seq 数据的基因过滤算法。
Bioinformatics. 2019 Aug 1;35(15):2602-2609. doi: 10.1093/bioinformatics/bty1016.

引用本文的文献

1
Integrative transcriptome-based drug repurposing in tuberculosis.基于整合转录组学的结核病药物再利用研究
bioRxiv. 2025 Jun 2:2025.06.02.657296. doi: 10.1101/2025.06.02.657296.
2
Genome-Scale Meta-analysis of Host Responses to Staphylococcus aureus Identifies Pathways for Host-Directed Therapeutic Targeting.金黄色葡萄球菌宿主反应的全基因组规模荟萃分析确定了宿主导向性治疗靶点的途径。
J Infect Dis. 2025 Aug 14;232(2):e290-e300. doi: 10.1093/infdis/jiaf290.
3
Enhancing the utility of polygenic scores in Alzheimer's disease through systematic curation and annotation.

本文引用的文献

1
Dynamic data-driven meta-analysis for prioritisation of host genes implicated in COVID-19.基于动态数据驱动的荟萃分析对 COVID-19 相关宿主基因进行优先级排序。
Sci Rep. 2020 Dec 18;10(1):22303. doi: 10.1038/s41598-020-79033-3.
2
Illuminating Host-Mycobacterial Interactions with Genome-wide CRISPR Knockout and CRISPRi Screens.利用全基因组 CRISPR 敲除和 CRISPRi 筛选技术揭示宿主-分枝杆菌相互作用。
Cell Syst. 2020 Sep 23;11(3):239-251.e7. doi: 10.1016/j.cels.2020.08.010.
3
Salmonella enterica serovar Typhimurium inhibits the innate immune response and promotes apoptosis in a ribosomal/TRP53-dependent manner in swine neutrophils.
通过系统整理和注释提高多基因评分在阿尔茨海默病中的效用。
Front Genet. 2025 Feb 4;16:1507395. doi: 10.3389/fgene.2025.1507395. eCollection 2025.
4
Transcriptional Dynamics and Key Regulators of Adipogenesis in Mouse Embryonic Stem Cells: Insights from Robust Rank Aggregation Analysis.从稳健排名聚合分析看小鼠胚胎干细胞成脂分化的转录动力学及关键调控因子
Int J Mol Sci. 2024 Aug 23;25(17):9154. doi: 10.3390/ijms25179154.
5
An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers.一种基于可解释机器学习的肺纤维化生物标志物提议。
Comput Struct Biotechnol J. 2023;21:2305-2315. doi: 10.1016/j.csbj.2023.03.043. Epub 2023 Mar 25.
6
The relationship between tumor infiltrating immune cells and the prognosis of patients with lung adenocarcinoma.肿瘤浸润免疫细胞与肺腺癌患者预后的关系。
J Thorac Dis. 2023 Feb 28;15(2):600-610. doi: 10.21037/jtd-22-1837.
鼠伤寒沙门氏菌血清型 Typhimurium 通过核糖体/TRP53 依赖性途径抑制猪中性粒细胞的固有免疫反应并促进其凋亡。
Vet Res. 2020 Aug 27;51(1):105. doi: 10.1186/s13567-020-00828-3.
4
Genome-wide CRISPR screen identifies host dependency factors for influenza A virus infection.全基因组 CRISPR 筛选鉴定流感 A 病毒感染的宿主依赖性因素。
Nat Commun. 2020 Jan 9;11(1):164. doi: 10.1038/s41467-019-13965-x.
5
The reactome pathway knowledgebase.Reactome 通路知识库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D498-D503. doi: 10.1093/nar/gkz1031.
6
A Genome-Wide Knockout Screen in Human Macrophages Identified Host Factors Modulating Infection.一项在人源巨噬细胞中的全基因组敲除筛选鉴定了调控感染的宿主因子。
mBio. 2019 Oct 8;10(5):e02169-19. doi: 10.1128/mBio.02169-19.
7
Alveolar Macrophage Apoptosis-associated Bacterial Killing Helps Prevent Murine Pneumonia.肺泡巨噬细胞凋亡相关的细菌杀伤有助于预防小鼠肺炎。
Am J Respir Crit Care Med. 2019 Jul 1;200(1):84-97. doi: 10.1164/rccm.201804-0646OC.
8
New approach for understanding genome variations in KEGG.KEGG 中基因组变异的新方法。
Nucleic Acids Res. 2019 Jan 8;47(D1):D590-D595. doi: 10.1093/nar/gky962.
9
A Bayesian latent variable approach to aggregation of partial and top-ranked lists in genomic studies.贝叶斯潜在变量方法在基因组研究中对部分和排名最高的列表的聚合。
Stat Med. 2018 Dec 10;37(28):4266-4278. doi: 10.1002/sim.7920. Epub 2018 Aug 9.
10
WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.WikiPathways:一个将代谢组学与其他组学研究联系起来的多方面的途径数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D661-D667. doi: 10.1093/nar/gkx1064.