Du Dongping, Bhardwaj Saurabh, Parker Sarah J, Cheng Zuolin, Zhang Zhen, Lu Yingzhou, Van Eyk Jennifer E, Yu Guoqiang, Clarke Robert, Herrington David M, Wang Yue
Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India.
bioRxiv. 2023 Jul 5:2023.07.05.547797. doi: 10.1101/2023.07.05.547797.
Analytics tools are essential to identify informative molecular features about different phenotypic groups. Among the most fundamental tasks are missing value imputation, signature gene detection, and expression pattern visualization. However, most commonly used analytics tools may be problematic for characterizing biologically diverse samples when either signature genes possess uneven missing rates across different groups yet involving complex missing mechanisms, or multiple biological groups are simultaneously compared and visualized.
We develop ABDS tool suite tailored specifically to analyzing biologically diverse samples. Mechanism-integrated group-wise imputation is developed to recruit signature genes involving informative missingness, cosine-based one-sample test is extended to detect enumerated signature genes, and unified heatmap is designed to comparably display complex expression patterns. We discuss the methodological principles and demonstrate the conceptual advantages of the three software tools. We also showcase the biomedical applications of these individual tools. Implemented in open-source R scripts, ABDS tool suite complements rather than replaces the existing tools and will allow biologists to more accurately detect interpretable molecular signals among diverse phenotypic samples.
The R Scripts of ABDS tool suite is freely available at https://github.com/niccolodpdu/ABDS.
分析工具对于识别不同表型组的信息性分子特征至关重要。其中最基本的任务包括缺失值插补、特征基因检测和表达模式可视化。然而,当特征基因在不同组中具有不均匀的缺失率且涉及复杂的缺失机制,或者同时对多个生物组进行比较和可视化时,大多数常用的分析工具在表征生物多样性样本时可能会出现问题。
我们开发了专门用于分析生物多样性样本的ABDS工具套件。开发了机制整合的分组插补方法,以纳入涉及信息性缺失的特征基因,将基于余弦的单样本检验扩展用于检测枚举的特征基因,并设计了统一热图以可比地显示复杂的表达模式。我们讨论了这三种软件工具的方法学原理并展示了其概念优势。我们还展示了这些单个工具的生物医学应用。ABDS工具套件以开源R脚本实现,它是对现有工具的补充而非替代,将使生物学家能够在不同表型样本中更准确地检测可解释的分子信号。
ABDS工具套件的R脚本可在https://github.com/niccolodpdu/ABDS上免费获取。