Sukhov Vladimir, Nugmanova Aigul, Vorontsov Yury, Mehrotra Parul, Kleverov Maksim, Ravichandran Kodi, Artyomov Maxim, Sergushichev Alexey
Department of Pathology and Immunology, Washington University in St. Louis School of Medicine, St. Louis, MO 63110, United States.
Computer Technologies Laboratory, ITMO University, Saint Petersburg 197101, Russia.
Nucleic Acids Res. 2025 May 5. doi: 10.1093/nar/gkaf372.
Public data repositories like Gene Expression Omnibus (GEO) contain an extensive amount of data from hundreds of thousands of experiments, making them a valuable resource for researchers. A common scenario for utilizing this resource is to show transcriptional similarity of one's own data to a public dataset as evidence of potentially similar biology. However, when searching for such datasets, researchers are usually limited to keyword-based search, which requires having a specific hypothesis and relies on the presence of high-quality metadata in public datasets. Here, we introduce CORESH, a web server designed to systematically find GEO datasets that match a user-provided gene signature-such as a list of top upregulated genes in response to a treatment-in a data-driven manner. CORESH operates on a compendium of >40 000 human and 40 000 mouse datasets and outputs a ranked list of datasets where the input genes exhibit similar expression patterns. The discovered datasets can then be used to identify experimental conditions associated with the activation of the query signature, offering insights into underlying biological mechanisms and guiding experimental validation. CORESH is freely accessible at https://alserglab.wustl.edu/coresh/, requires no login, and is regularly updated with the latest GEO data.
像基因表达综合数据库(GEO)这样的公共数据存储库包含来自数十万次实验的大量数据,使其成为研究人员的宝贵资源。利用这一资源的常见情况是将自己的数据与公共数据集的转录相似性作为潜在相似生物学的证据。然而,在搜索此类数据集时,研究人员通常仅限于基于关键词的搜索,这需要有一个特定的假设,并依赖于公共数据集中高质量元数据的存在。在这里,我们介绍了CORESH,一个网络服务器,旨在以数据驱动的方式系统地找到与用户提供的基因特征相匹配的GEO数据集,例如对一种治疗有反应的上调基因列表。CORESH基于一个包含超过40000个人类和40000个小鼠数据集的纲要进行操作,并输出一个数据集排名列表,其中输入基因表现出相似的表达模式。然后,发现的数据集可用于识别与查询特征激活相关的实验条件,深入了解潜在的生物学机制并指导实验验证。CORESH可通过https://alserglab.wustl.edu/coresh/免费访问,无需登录,并定期更新最新的GEO数据。