Suppr超能文献

Denoiseit:基于秩分离树的基因表达数据去噪。

Denoiseit: denoising gene expression data using rank based isolation trees.

机构信息

Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-gu, Seoul, 08826, Republic of Korea.

School of Computer Science and Engineering, Kyungpook National University, Buk-gu, Daegu, 41566, Republic of Korea.

出版信息

BMC Bioinformatics. 2024 Aug 21;25(1):271. doi: 10.1186/s12859-024-05899-z.

Abstract

BACKGROUND

Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers.

METHOD

In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes.

RESULTS

DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https://github.com/cobi-git/DenoiseIt.

摘要

背景

在进行任何下游基因表达分析之前,选择信息丰富的基因或消除非信息丰富的基因是一项具有重大影响的标准任务。精心策划的基因集显著提高了识别有意义生物标志物的可能性。

方法

与传统的专注于选择高度信息丰富的基因的正向基因搜索方法相反,我们提出了一种反向搜索方法 DenoiseIt,旨在消除潜在的异常基因,从而生成一个具有降低噪声的稳健基因集。DenoiseIt 构建的基因集有望捕获具有生物学意义的基因,同时尽可能剔除不相关的基因。因此,它还增强了下游比较基因表达分析的质量。DenoiseIt 利用非负矩阵分解和隔离森林来识别异常等级特征,并去除与其相关的基因。

结果

DenoiseIt 应用于从 TCGA 和 COVID-19 队列中收集的批量和单细胞 RNA-seq 数据,表明它能够熟练地识别和去除仅在特定样本而不是已知组中表现出表达异常的基因。与现有方法相比,DenoiseIt 还显示出在降低技术噪声水平的同时保留更高比例的生物学相关基因。DenoiseIt 软件可在 GitHub 上公开获取,网址为 https://github.com/cobi-git/DenoiseIt。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/060351c63561/12859_2024_5899_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验