Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA.
Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, 147004, Punjab, India.
Sci Rep. 2024 Nov 16;14(1):28265. doi: 10.1038/s41598-024-78076-0.
Bioinformatics software tools are essential to identify informative molecular features that define different phenotypic sample groups. Among the most fundamental and interrelated tasks are missing value imputation, signature gene detection, and differential pattern visualization. However, many commonly used analytics tools can be problematic when handling biologically diverse samples if either informative missingness possess high missing rates with mixed missing mechanisms, or multiple sample groups are compared and visualized in parallel. We developed the ABDS tool suite specifically for analyzing biologically diverse samples. Collectively, a mechanism-integrated group-wise pre-imputation scheme is proposed to retain informative missingness associated with signature genes, a cosine-based one-sample test is extended to detect group-silenced signature genes, and a unified heatmap is designed to display multiple sample groups. We describe the methodological principles and demonstrate the effectiveness of three analytics tools under targeted scenarios, supported by comparative evaluations and biomedical showcases. As an open-source R package, ABDS tool suite complements rather than replaces existing tools and will allow biologists to more accurately detect interpretable molecular signals among phenotypically diverse sample groups.
生物信息学软件工具对于识别定义不同表型样本组的信息分子特征至关重要。其中最基本和相互关联的任务包括缺失值插补、特征基因检测和差异模式可视化。然而,如果处理具有不同生物学特征的样本时,许多常用的分析工具可能会出现问题,要么是因为具有高缺失率和混合缺失机制的信息缺失,要么是因为同时比较和可视化多个样本组。我们专门开发了 ABDS 工具套件来分析具有不同生物学特征的样本。总的来说,提出了一种基于机制的分组预插补方案来保留与特征基因相关的信息缺失,扩展了基于余弦的单一样本检验来检测组沉默特征基因,并设计了一个统一的热图来显示多个样本组。我们描述了方法原理,并通过比较评估和生物医学展示,针对特定场景演示了三个分析工具的有效性。作为一个开源的 R 包,ABDS 工具套件是对现有工具的补充,而不是替代,它将使生物学家能够更准确地在表型多样化的样本组中检测可解释的分子信号。