• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Denoiseit:基于秩分离树的基因表达数据去噪。

Denoiseit: denoising gene expression data using rank based isolation trees.

机构信息

Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-gu, Seoul, 08826, Republic of Korea.

School of Computer Science and Engineering, Kyungpook National University, Buk-gu, Daegu, 41566, Republic of Korea.

出版信息

BMC Bioinformatics. 2024 Aug 21;25(1):271. doi: 10.1186/s12859-024-05899-z.

DOI:10.1186/s12859-024-05899-z
PMID:39169300
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11340143/
Abstract

BACKGROUND

Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers.

METHOD

In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes.

RESULTS

DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https://github.com/cobi-git/DenoiseIt.

摘要

背景

在进行任何下游基因表达分析之前,选择信息丰富的基因或消除非信息丰富的基因是一项具有重大影响的标准任务。精心策划的基因集显著提高了识别有意义生物标志物的可能性。

方法

与传统的专注于选择高度信息丰富的基因的正向基因搜索方法相反,我们提出了一种反向搜索方法 DenoiseIt,旨在消除潜在的异常基因,从而生成一个具有降低噪声的稳健基因集。DenoiseIt 构建的基因集有望捕获具有生物学意义的基因,同时尽可能剔除不相关的基因。因此,它还增强了下游比较基因表达分析的质量。DenoiseIt 利用非负矩阵分解和隔离森林来识别异常等级特征,并去除与其相关的基因。

结果

DenoiseIt 应用于从 TCGA 和 COVID-19 队列中收集的批量和单细胞 RNA-seq 数据,表明它能够熟练地识别和去除仅在特定样本而不是已知组中表现出表达异常的基因。与现有方法相比,DenoiseIt 还显示出在降低技术噪声水平的同时保留更高比例的生物学相关基因。DenoiseIt 软件可在 GitHub 上公开获取,网址为 https://github.com/cobi-git/DenoiseIt。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/74f575c427e4/12859_2024_5899_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/060351c63561/12859_2024_5899_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/25e807a4cded/12859_2024_5899_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/0f82f2bc56f6/12859_2024_5899_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/515af2a6b2b6/12859_2024_5899_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/3896b5b56e2c/12859_2024_5899_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/102f1cf77903/12859_2024_5899_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/74f575c427e4/12859_2024_5899_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/060351c63561/12859_2024_5899_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/25e807a4cded/12859_2024_5899_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/0f82f2bc56f6/12859_2024_5899_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/515af2a6b2b6/12859_2024_5899_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/3896b5b56e2c/12859_2024_5899_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/102f1cf77903/12859_2024_5899_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/344c/11340143/74f575c427e4/12859_2024_5899_Fig7_HTML.jpg

相似文献

1
Denoiseit: denoising gene expression data using rank based isolation trees.Denoiseit:基于秩分离树的基因表达数据去噪。
BMC Bioinformatics. 2024 Aug 21;25(1):271. doi: 10.1186/s12859-024-05899-z.
2
WEDGE: imputation of gene expression values from single-cell RNA-seq datasets using biased matrix decomposition.WEDGE:使用有偏矩阵分解从单细胞 RNA-seq 数据集推断基因表达值。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab085.
3
scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization.scRNMF:一种基于鲁棒非负矩阵分解的单细胞 RNA-seq 数据插补方法。
PLoS Comput Biol. 2024 Aug 8;20(8):e1012339. doi: 10.1371/journal.pcbi.1012339. eCollection 2024 Aug.
4
DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-seq data.DeepDRIM:一种基于深度神经网络的方法,可使用单细胞 RNA-seq 数据重建细胞类型特异性基因调控网络。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab325.
5
Robust classification of single-cell transcriptome data by nonnegative matrix factorization.通过非负矩阵分解对单细胞转录组数据进行稳健分类。
Bioinformatics. 2017 Jan 15;33(2):235-242. doi: 10.1093/bioinformatics/btw607. Epub 2016 Sep 23.
6
Ranking of cell clusters in a single-cell RNA-sequencing analysis framework using prior knowledge.基于先验知识的单细胞 RNA 测序分析框架中的细胞簇排序。
PLoS Comput Biol. 2024 Apr 18;20(4):e1011550. doi: 10.1371/journal.pcbi.1011550. eCollection 2024 Apr.
7
Using RNentropy to Detect Significant Variation in Gene Expression Across Multiple RNA-Seq or Single-Cell RNA-Seq Samples.使用 RNentropy 检测多个 RNA-Seq 或单细胞 RNA-Seq 样本中基因表达的显著变化。
Methods Mol Biol. 2021;2284:77-96. doi: 10.1007/978-1-0716-1307-8_6.
8
scLEGA: an attention-based deep clustering method with a tendency for low expression of genes on single-cell RNA-seq data.scLEGA:一种基于注意力的深度聚类方法,在单细胞 RNA-seq 数据中倾向于低表达基因。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae371.
9
Enhancing biological signals and detection rates in single-cell RNA-seq experiments with cDNA library equalization.通过 cDNA 文库均化提高单细胞 RNA-seq 实验中的生物信号和检测率。
Nucleic Acids Res. 2022 Jan 25;50(2):e12. doi: 10.1093/nar/gkab1071.
10
Variance-adjusted Mahalanobis (VAM): a fast and accurate method for cell-specific gene set scoring.方差调整的马氏距离 (VAM):一种快速准确的细胞特异性基因集评分方法。
Nucleic Acids Res. 2020 Sep 18;48(16):e94. doi: 10.1093/nar/gkaa582.

引用本文的文献

1
NanoBinder: a machine learning assisted nanobody binding prediction tool using Rosetta energy scores.纳米抗体结合预测器:一种使用罗塞塔能量分数的机器学习辅助纳米抗体结合预测工具。
J Cheminform. 2025 Jun 16;17(1):96. doi: 10.1186/s13321-025-01040-1.

本文引用的文献

1
OutSingle: a novel method of detecting and injecting outliers in RNA-Seq count data using the optimal hard threshold for singular values.OutSingle:一种使用最优硬阈值检测和注射 RNA-Seq 计数数据中异常值的新方法,用于奇异值。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad142.
2
Zinc finger and SCAN domain containing 1, ZSCAN1, is a novel stemness-related tumor suppressor and transcriptional repressor in breast cancer targeting TAZ.含锌指和SCAN结构域蛋白1(ZSCAN1)是一种新型的与干性相关的肿瘤抑制因子,也是乳腺癌中靶向TAZ的转录抑制因子。
Front Oncol. 2023 Feb 27;13:1041688. doi: 10.3389/fonc.2023.1041688. eCollection 2023.
3
Establishment of the large-scale longitudinal multi-omics dataset in COVID-19 patients: data profile and biospecimen.
COVID-19 患者大规模纵向多组学数据集的建立:数据概况和生物样本。
BMB Rep. 2022 Sep;55(9):465-471. doi: 10.5483/BMBRep.2022.55.9.077.
4
noisyR: enhancing biological signal in sequencing datasets by characterizing random technical noise.noisyR:通过对随机技术噪声进行特征化来增强测序数据集的生物信号。
Nucleic Acids Res. 2021 Aug 20;49(14):e83. doi: 10.1093/nar/gkab433.
5
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.
6
Targeting of lactate dehydrogenase C dysregulates the cell cycle and sensitizes breast cancer cells to DNA damage response targeted therapy.靶向乳酸脱氢酶 C 失调细胞周期并使乳腺癌细胞对 DNA 损伤反应靶向治疗敏感。
Mol Oncol. 2022 Feb;16(4):885-903. doi: 10.1002/1878-0261.13024. Epub 2021 Jun 13.
7
Not So Dead Genes-Retrocopies as Regulators of Their Disease-Related Progenitors and Hosts.非死基因——返座子作为其疾病相关前体细胞和宿主的调控因子。
Cells. 2021 Apr 15;10(4):912. doi: 10.3390/cells10040912.
8
The role of YWHAZ in cancer: A maze of opportunities and challenges.14-3-3ζ在癌症中的作用:机遇与挑战交织的迷宫。
J Cancer. 2020 Feb 3;11(8):2252-2264. doi: 10.7150/jca.41316. eCollection 2020.
9
Comprehensive Integration of Single-Cell Data.单细胞数据的综合整合。
Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.
10
SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.SVM-RFE:通过非线性核选择和可视化最相关特征。
BMC Bioinformatics. 2018 Nov 19;19(1):432. doi: 10.1186/s12859-018-2451-4.