Suppr超能文献

达卡:用于从单细胞基因组数据中揭示肿瘤异质性的变分自编码器。

Dhaka: variational autoencoder for unmasking tumor heterogeneity from single cell genomic data.

作者信息

Rashid Sabrina, Shah Sohrab, Bar-Joseph Ziv, Pandya Ravi

机构信息

Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15232, USA.

Department of Computer Science.

出版信息

Bioinformatics. 2021 Jul 12;37(11):1535-1543. doi: 10.1093/bioinformatics/btz095.

Abstract

MOTIVATION

Intra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.

RESULTS

Here we describe 'Dhaka', a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and six single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.

AVAILABILITY AND IMPLEMENTATION

All the datasets used in the paper are publicly available and developed software package and supporting info is available on Github https://github.com/MicrosoftGenomics/Dhaka.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

肿瘤内异质性是解读肿瘤进化过程中的关键混杂因素之一。即使起源于单个祖细胞,恶性细胞在基因表达、拷贝数和突变方面也存在差异。肿瘤细胞的单细胞测序最近已成为揭示潜在肿瘤异质性的可行选择。然而,由于数据极其嘈杂和稀疏的性质,从单细胞基因组数据中提取特征以推断其进化轨迹在计算上仍然具有挑战性。

结果

在此,我们描述了“达卡”(Dhaka),一种变分自编码器方法,它将单细胞基因组数据转换为一个降维特征空间,该空间在区分(隐藏的)肿瘤亚群方面更有效。我们的方法具有通用性,可应用于几种不同类型的基因组数据,包括来自scDNA-Seq的拷贝数变异和来自scRNA-Seq实验的基因表达。我们在合成数据集和六个单细胞癌症数据集上测试了该方法,每个样本中的细胞数量在250到6000之间。对所得特征空间的分析揭示了细胞亚群及其标记基因。这些特征还能够推断细胞之间的谱系和/或分化轨迹,大大改进了此前针对此类数据的特征提取和降维所建议的方法。

可用性与实现

本文中使用的所有数据集均可公开获取,并且开发的软件包及支持信息可在Github上获取,网址为https://github.com/MicrosoftGenomics/Dhaka。

补充信息

补充数据可在《生物信息学》在线版获取。

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验