Jogani Saiyam, Pol Anand Santosh, Prajapati Mayur, Samal Amit, Bhatia Kriti, Parmar Jayendra, Patel Urvik, Shah Falak, Vyas Nisarg, Gupta Saurabh
Department of Generative AI & Bioinformatics, Infocusp Innovations, Laxman Nagar Baner, Pune 411045, Maharashtra, India.
Department of Generative AI & Bioinformatics, Infocusp Innovations, Gala-hub, Bopal, Ahmedabad 380058, Gujarat, India.
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf243.
Single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) produces vast amounts of individual cell profiling data. Its analysis presents a significant challenge in accurately annotating cell types and their associated biomarkers. Different pipelines based on deep neural network (DNN) methods have been employed to tackle these issues. These pipelines have arisen as a promising resource and can extract meaningful and concise features from noisy, diverse, and high-dimensional data to enhance annotations and subsequent analysis. Existing tools require high computational resources to execute large sample datasets. We have developed a cutting-edge platform known as scaLR (Single-cell analysis using low resource) that efficiently processes data into feature subsets, samples in batches to reduce the required memory for processing large datasets, and running DNN models in multiple central processing units. scaLR is equipped with data processing, feature extraction, training, evaluation, and downstream analysis. Its novel feature extraction algorithm first trains the model on a feature subset and stores the importance of the features for all the features in that subset. At the end of the training of all subsets, the top-K features are selected based on their importance. The final model is trained on top-K features; its performance evaluation and associated downstream analysis provide significant biomarkers for different cell types and diseases/traits. Our findings indicate that scaLR offers comparable prediction accuracy and requires less model training time and computational resources than existing Python-based pipelines. We present scaLR, a Python-based platform, engineered to utilize minimal computational resources while maintaining comparable execution times and analysis costs to existing frameworks.
单细胞核糖核酸(RNA)测序(scRNA-seq)产生了大量的单个细胞分析数据。其分析在准确注释细胞类型及其相关生物标志物方面提出了重大挑战。基于深度神经网络(DNN)方法的不同流程已被用于解决这些问题。这些流程已成为一种有前途的资源,能够从嘈杂、多样且高维的数据中提取有意义且简洁的特征,以增强注释和后续分析。现有工具需要高计算资源来执行大型样本数据集。我们开发了一个前沿平台,称为scaLR(使用低资源进行单细胞分析),它能有效地将数据处理成特征子集,批量处理样本以减少处理大型数据集所需的内存,并在多个中央处理器中运行DNN模型。scaLR具备数据处理、特征提取、训练、评估和下游分析功能。其新颖的特征提取算法首先在一个特征子集上训练模型,并存储该子集中所有特征的重要性。在所有子集训练结束时,根据重要性选择前K个特征。最终模型在前K个特征上进行训练;其性能评估和相关的下游分析为不同细胞类型和疾病/性状提供了重要的生物标志物。我们的研究结果表明,与现有的基于Python的流程相比,scaLR具有相当的预测准确性,且需要更少的模型训练时间和计算资源。我们展示了scaLR,这是一个基于Python的平台,旨在利用最少的计算资源,同时保持与现有框架相当的执行时间和分析成本。