Ten Foo Wei, Yuan Dongsheng, Jabareen Nabil, Phua Yin Jun, Eils Roland, Lukassen Sören, Conrad Christian
Center for Digital Health, Berlin Institute of Health (BIH) at Charité-Universitatsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany.
Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Berlin, Germany.
Front Cell Dev Biol. 2023 Feb 15;11:1091047. doi: 10.3389/fcell.2023.1091047. eCollection 2023.
Feature identification and manual inspection is currently still an integral part of biological data analysis in single-cell sequencing. Features such as expressed genes and open chromatin status are selectively studied in specific contexts, cell states or experimental conditions. While conventional analysis methods construct a relatively static view on gene candidates, artificial neural networks have been used to model their interactions after hierarchical gene regulatory networks. However, it is challenging to identify consistent features in this modeling process due to the inherently stochastic nature of these methods. Therefore, we propose using ensembles of autoencoders and subsequent rank aggregation to extract consensus features in a less biased manner. Here, we performed sequencing data analyses of different modalities either independently or simultaneously as well as with other analysis tools. Our resVAE ensemble method can successfully complement and find additional unbiased biological insights with minimal data processing or feature selection steps while giving a measurement of confidence, especially for models using stochastic or approximation algorithms. In addition, our method can also work with overlapping clustering identity assignment suitable for transitionary cell types or cell fates in comparison to most conventional tools.
特征识别和人工检查目前仍是单细胞测序中生物数据分析不可或缺的一部分。诸如表达基因和开放染色质状态等特征会在特定背景、细胞状态或实验条件下进行选择性研究。虽然传统分析方法对基因候选物构建了相对静态的视图,但人工神经网络已被用于在分层基因调控网络之后对它们的相互作用进行建模。然而,由于这些方法固有的随机性,在这个建模过程中识别一致的特征具有挑战性。因此,我们建议使用自动编码器集成和随后的秩聚合,以较少偏差的方式提取共识特征。在这里,我们独立地或同时地以及与其他分析工具一起对不同模式的测序数据进行了分析。我们的resVAE集成方法可以成功地补充并以最少的数据处理或特征选择步骤找到额外的无偏差生物学见解,同时给出置信度测量,特别是对于使用随机或近似算法的模型。此外,与大多数传统工具相比,我们的方法还可以处理适用于过渡细胞类型或细胞命运的重叠聚类身份分配。