• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

类别编码方法用于对批量和单细胞 RNA-seq 数据进行分类的特征基因选择。

Category encoding method to select feature genes for the classification of bulk and single-cell RNA-seq data.

机构信息

Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Institute of Statistical Sciences, College of Mathematics and Statistics, Shenzhen University, Shenzhen, China.

Department of Mathematics, Hong Kong University, Pokfulam, Hong Kong.

出版信息

Stat Med. 2021 Aug 15;40(18):4077-4089. doi: 10.1002/sim.9015. Epub 2021 May 24.

DOI:10.1002/sim.9015
PMID:34028849
Abstract

Bulk and single-cell RNA-seq (scRNA-seq) data are being used as alternatives to traditional technology in biology and medicine research. These data are used, for example, for the detection of differentially expressed (DE) genes. Several statistical methods have been developed for the classification of bulk and single-cell RNA-seq data. These feature genes are vitally important for the classification of bulk and single-cell RNA-seq data. The majority of genes are not DE and they are thus irrelevant for class distinction. To improve the classification performance and save the computation time, removal of irrelevant genes is necessary. Removal will aid the detection of the important feature genes. Widely used schemes in the literature, such as the BSS/WSS (BW) method, assume that data are normally distributed and may not be suitable for bulk and single-cell RNA-seq data. In this article, a category encoding (CAEN) method is proposed to select feature genes for bulk and single-cell RNA-seq data classification. This novel method encodes categories by employing the rank of sequence samples for each gene in each class. Correlation coefficients are considered for gene and class with the rank of sample and a new rank of category. The highest gene correlation coefficients are considered feature genes, which are the most effective for classifying bulk and single-cell RNA-seq dataset. The sure screening method was also established for rank consistency properties of the proposed CAEN method. Simulation studies show that the classifier using the proposed CAEN method performs better than, or at least as well as, the existing methods in most settings. Existing real datasets were analyzed, with the results demonstrating superior performance of the proposed method over current competitors. The application has been coded into an R package named "CAEN" to facilitate wide use.

摘要

批量和单细胞 RNA-seq (scRNA-seq) 数据正在被用作生物学和医学研究中传统技术的替代品。这些数据例如被用于差异表达 (DE) 基因的检测。已经开发了几种统计方法来对批量和单细胞 RNA-seq 数据进行分类。这些特征基因对于批量和单细胞 RNA-seq 数据的分类至关重要。大多数基因不是 DE,因此与类别区分无关。为了提高分类性能并节省计算时间,有必要去除不相关的基因。去除将有助于检测重要的特征基因。文献中广泛使用的方案,如 BSS/WSS (BW) 方法,假设数据是正态分布的,可能不适合批量和单细胞 RNA-seq 数据。在本文中,提出了一种类别编码 (CAEN) 方法,用于选择批量和单细胞 RNA-seq 数据分类的特征基因。该新方法通过对每个类别的每个基因的序列样本的秩进行编码来对类别进行编码。考虑了基因和类别的相关系数,以及样本的秩和新的类别秩。最高的基因相关系数被认为是特征基因,它们是对批量和单细胞 RNA-seq 数据集进行分类最有效的基因。还为所提出的 CAEN 方法的秩一致性特性建立了确证筛选方法。模拟研究表明,在大多数情况下,使用所提出的 CAEN 方法的分类器的性能优于或至少与现有方法相当。还分析了现有的真实数据集,结果表明所提出的方法优于当前竞争对手的性能。该应用程序已被编码为一个名为“CAEN”的 R 包,以方便广泛使用。

相似文献

1
Category encoding method to select feature genes for the classification of bulk and single-cell RNA-seq data.类别编码方法用于对批量和单细胞 RNA-seq 数据进行分类的特征基因选择。
Stat Med. 2021 Aug 15;40(18):4077-4089. doi: 10.1002/sim.9015. Epub 2021 May 24.
2
A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data.一种用于检测 RNA-seq 数据中差异表达基因的无标度最小外包球方法。
BMC Genomics. 2021 Jun 26;22(1):479. doi: 10.1186/s12864-021-07790-0.
3
scDLC: a deep learning framework to classify large sample single-cell RNA-seq data.scDLC:一种用于分类大型单细胞 RNA-seq 数据的深度学习框架。
BMC Genomics. 2022 Jul 12;23(1):504. doi: 10.1186/s12864-022-08715-1.
4
Improving bulk RNA-seq classification by transferring gene signature from single cells in acute myeloid leukemia.通过从急性髓系白血病的单细胞中转录基因特征提高批量 RNA-seq 分类。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac002.
5
Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data.通过整合 bulk 和单细胞 RNA-seq 数据进行癌症药物反应的深度迁移学习。
Nat Commun. 2022 Oct 30;13(1):6494. doi: 10.1038/s41467-022-34277-7.
6
Modeling group heteroscedasticity in single-cell RNA-seq pseudo-bulk data.对单细胞 RNA-seq 拟时间序列数据中的模型组异方差进行建模。
Genome Biol. 2023 May 5;24(1):107. doi: 10.1186/s13059-023-02949-2.
7
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.一种用于隐性营养不良型大疱性表皮松解症的单细胞 RNA-seq 分析的多任务聚类方法。
PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.
8
Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data.通过整合分析批量和单细胞 RNA 测序数据检测细胞类型特异性等位基因表达失衡。
PLoS Genet. 2021 Mar 4;17(3):e1009080. doi: 10.1371/journal.pgen.1009080. eCollection 2021 Mar.
9
A novel method for predicting cell abundance based on single-cell RNA-seq data.基于单细胞 RNA-seq 数据的细胞丰度预测新方法。
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):281. doi: 10.1186/s12859-021-04187-4.
10
SimBu: bias-aware simulation of bulk RNA-seq data with variable cell-type composition.SimBu:具有可变细胞类型组成的批量 RNA-seq 数据的偏差感知模拟。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii141-ii147. doi: 10.1093/bioinformatics/btac499.

引用本文的文献

1
scRGCL: a cell type annotation method for single-cell RNA-seq data using residual graph convolutional neural network with contrastive learning.scRGCL:一种使用带有对比学习的残差图卷积神经网络对单细胞RNA测序数据进行细胞类型注释的方法。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae662.