• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

社区检测算法在大型生物数据集上的应用。

Applications of Community Detection Algorithms to Large Biological Datasets.

机构信息

BIU, Department of Bioengineering, Bar-Ilan University, Ramat Gan, Israel.

出版信息

Methods Mol Biol. 2021;2243:59-80. doi: 10.1007/978-1-0716-1103-6_3.

DOI:10.1007/978-1-0716-1103-6_3
PMID:33606252
Abstract

Recent advances in data acquiring technologies in biology have led to major challenges in mining relevant information from large datasets. For example, single-cell RNA sequencing technologies are producing expression and sequence information from tens of thousands of cells in every single experiment. A common task in analyzing biological data is to cluster samples or features (e.g., genes) into groups sharing common characteristics. This is an NP-hard problem for which numerous heuristic algorithms have been developed. However, in many cases, the clusters created by these algorithms do not reflect biological reality. To overcome this, a Networks Based Clustering (NBC) approach was recently proposed, by which the samples or genes in the dataset are first mapped to a network and then community detection (CD) algorithms are used to identify clusters of nodes.Here, we created an open and flexible python-based toolkit for NBC that enables easy and accessible network construction and community detection. We then tested the applicability of NBC for identifying clusters of cells or genes from previously published large-scale single-cell and bulk RNA-seq datasets.We show that NBC can be used to accurately and efficiently analyze large-scale datasets of RNA sequencing experiments.

摘要

生物学中数据获取技术的最新进展给从大型数据集挖掘相关信息带来了重大挑战。例如,单细胞 RNA 测序技术在每个实验中都能从数以万计的细胞中获得表达和序列信息。分析生物数据的常见任务是将样本或特征(例如基因)聚类成具有共同特征的组。对于这个 NP 难问题,已经开发了许多启发式算法。然而,在许多情况下,这些算法创建的聚类并不反映生物现实。为了克服这个问题,最近提出了一种基于网络的聚类(NBC)方法,通过该方法,数据集的样本或基因首先被映射到网络上,然后使用社区检测(CD)算法来识别节点的聚类。在这里,我们创建了一个基于 Python 的开放且灵活的 NBC 工具包,使网络构建和社区检测变得简单且易于访问。然后,我们测试了 NBC 用于从先前发表的大规模单细胞和批量 RNA-seq 数据集中识别细胞或基因聚类的适用性。我们表明,NBC 可用于准确有效地分析大规模 RNA 测序实验数据集。

相似文献

1
Applications of Community Detection Algorithms to Large Biological Datasets.社区检测算法在大型生物数据集上的应用。
Methods Mol Biol. 2021;2243:59-80. doi: 10.1007/978-1-0716-1103-6_3.
2
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.一种用于隐性营养不良型大疱性表皮松解症的单细胞 RNA-seq 分析的多任务聚类方法。
PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.
3
VPAC: Variational projection for accurate clustering of single-cell transcriptomic data.VPAC:用于单细胞转录组数据精确聚类的变分投影。
BMC Bioinformatics. 2019 May 1;20(Suppl 7):0. doi: 10.1186/s12859-019-2742-4.
4
A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies.基于贝叶斯混合模型的群体研究中基于液滴的单细胞转录组学数据聚类方法。
Nat Commun. 2019 Apr 9;10(1):1649. doi: 10.1038/s41467-019-09639-3.
5
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
6
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.DIMM-SC:一种基于 Dirichlet 混合模型的用于聚类基于液滴的单细胞转录组学数据的方法。
Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.
7
SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble.SAME 聚类:基于混合模型集成的单细胞聚集聚类。
Nucleic Acids Res. 2020 Jan 10;48(1):86-95. doi: 10.1093/nar/gkz959.
8
Data Analysis in Single-Cell Transcriptome Sequencing.单细胞转录组测序中的数据分析
Methods Mol Biol. 2018;1754:311-326. doi: 10.1007/978-1-4939-7717-8_18.
9
Recursive Consensus Clustering for novel subtype discovery from transcriptome data.基于转录组数据的新型亚型发现的递归共识聚类。
Sci Rep. 2020 Jul 3;10(1):11005. doi: 10.1038/s41598-020-67016-3.
10
Automatic Cell Type Annotation Using Marker Genes for Single-Cell RNA Sequencing Data.基于标记基因的单细胞 RNA 测序数据自动细胞类型注释。
Biomolecules. 2022 Oct 21;12(10):1539. doi: 10.3390/biom12101539.

引用本文的文献

1
Identification of Distinct Biological Groups of Patients With Cryptogenic NORSE via Inflammatory Profiling.通过炎症分析鉴定隐源性不明原因癫痫性脑病患者的不同生物学组
Neurol Neuroimmunol Neuroinflamm. 2025 Jul;12(4):e200403. doi: 10.1212/NXI.0000000000200403. Epub 2025 May 7.
2
Patterns of Dietary Fatty Acids and Fat Spreads in Relation to Blood Pressure, Lipids and Insulin Resistance in Young Adults: A Repeat Cross-Sectional Study.年轻人膳食脂肪酸和脂肪涂抹酱模式与血压、血脂及胰岛素抵抗的关系:一项重复横断面研究
Nutrients. 2025 Feb 28;17(5):869. doi: 10.3390/nu17050869.
3
Phenotyping to predict 12-month health outcomes of older general medicine patients.

本文引用的文献

1
Normalizing single-cell RNA sequencing data: challenges and opportunities.单细胞RNA测序数据的标准化:挑战与机遇
Nat Methods. 2017 Jun;14(6):565-571. doi: 10.1038/nmeth.4292. Epub 2017 May 15.
2
SCell: integrated analysis of single-cell RNA-seq data.SCell:单细胞RNA测序数据的综合分析
Bioinformatics. 2016 Jul 15;32(14):2219-20. doi: 10.1093/bioinformatics/btw201. Epub 2016 Apr 19.
3
SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis.SINCERA:一种用于单细胞RNA测序分析的流程
通过表型分析预测老年普通内科患者的12个月健康结局。
Aging Clin Exp Res. 2025 Feb 22;37(1):42. doi: 10.1007/s40520-024-02924-2.
4
Gut microbiome strain-sharing within isolated village social networks.孤立村庄社会网络中的肠道微生物菌株共享
Nature. 2025 Jan;637(8044):167-175. doi: 10.1038/s41586-024-08222-1. Epub 2024 Nov 20.
5
Clustering Molecules at a Large Scale: Integrating Spectral Geometry with Deep Learning.大规模分子聚类:将光谱几何与深度学习相结合
Molecules. 2024 Aug 17;29(16):3902. doi: 10.3390/molecules29163902.
6
..
J Biosci. 2022;47(2). doi: 10.1007/s12038-022-00253-y.
PLoS Comput Biol. 2015 Nov 24;11(11):e1004575. doi: 10.1371/journal.pcbi.1004575. eCollection 2015 Nov.
4
Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.比较Illumina高通量RNA测序数据差异分析的标准化方法。
BMC Bioinformatics. 2015 Oct 28;16:347. doi: 10.1186/s12859-015-0778-7.
5
Big Data: Astronomical or Genomical?大数据:天文学的还是基因组学的?
PLoS Biol. 2015 Jul 7;13(7):e1002195. doi: 10.1371/journal.pbio.1002195. eCollection 2015 Jul.
6
The landscape of genomic imprinting across diverse adult human tissues.不同成人人类组织中的基因组印记概况。
Genome Res. 2015 Jul;25(7):927-36. doi: 10.1101/gr.192278.115. Epub 2015 May 7.
7
Assessing allele-specific expression across multiple tissues from RNA-seq read data.从RNA测序读取数据评估多个组织中的等位基因特异性表达。
Bioinformatics. 2015 Aug 1;31(15):2497-504. doi: 10.1093/bioinformatics/btv074. Epub 2015 Mar 27.
8
Identification of cell types from single-cell transcriptomes using a novel clustering method.基于新型聚类方法的单细胞转录组细胞类型鉴定。
Bioinformatics. 2015 Jun 15;31(12):1974-80. doi: 10.1093/bioinformatics/btv088. Epub 2015 Feb 11.
9
Proteomics. Tissue-based map of the human proteome.蛋白质组学。人类蛋白质组组织图谱。
Science. 2015 Jan 23;347(6220):1260419. doi: 10.1126/science.1260419.
10
Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes.泛癌网络分析确定了跨通路和蛋白质复合物的罕见体细胞突变组合。
Nat Genet. 2015 Feb;47(2):106-14. doi: 10.1038/ng.3168. Epub 2014 Dec 15.