Suppr超能文献

评估单细胞基因调控网络推理算法的可重复性。

Evaluating the Reproducibility of Single-Cell Gene Regulatory Network Inference Algorithms.

作者信息

Kang Yoonjee, Thieffry Denis, Cantini Laura

机构信息

Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR 8197, INSERM U1024, Ecole Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France.

出版信息

Front Genet. 2021 Mar 22;12:617282. doi: 10.3389/fgene.2021.617282. eCollection 2021.

Abstract

Networks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth. Here, we benchmark six single-cell network inference methods based on their reproducibility, i.e., their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis. Once taking into account networks with up to 100,000 links, GENIE3 results to be the most reproducible algorithm and, together with GRNBoost2, show higher intersection with ground-truth biological interactions. These results are independent from the single-cell sequencing platform, the cell type annotation system and the number of cells constituting the dataset. Finally, GRNBoost2 and CLR show more reproducible performance once a more stringent thresholding is applied to the networks (1,000-100 links). In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.

摘要

网络是表示和研究生物系统的强大工具。从功能基因组学数据推断调控相互作用的算法开发一直是一个活跃的研究领域。随着单细胞RNA测序数据(scRNA-seq)的出现,已经提出了许多专门设计用于利用单细胞数据集的方法。然而,已发表的单细胞网络推断基准大多基于模拟数据。一旦应用于真实数据,这些基准仅考虑一小部分基因,并且仅将推断的网络与强加的真实情况进行比较。在这里,我们基于六种单细胞网络推断方法的可重复性对其进行基准测试,即它们在应用于相同生物学条件的两个独立数据集时推断相似网络的能力。我们在来自三种生物学条件的真实数据上测试了每种方法:人类视网膜、结直肠癌中的T细胞和人类造血。一旦考虑到具有多达100,000个链接的网络,GENIE3结果是最具可重复性的算法,并且与GRNBoost2一起,与真实生物学相互作用的交集更高。这些结果与单细胞测序平台、细胞类型注释系统以及构成数据集的细胞数量无关。最后,一旦对网络应用更严格的阈值(1000 - 100个链接),GRNBoost2和CLR显示出更具可重复性的性能。为了确保这项基准研究的可重复性并便于扩展,我们在scNET中实现了所有分析,scNET是一个可在https://github.com/ComputationalSystemsBiology/scNET获得的Jupyter笔记本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/659d/8019823/46e4a4bf1d05/fgene-12-617282-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验