Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA.
Genes (Basel). 2023 Apr 23;14(5):961. doi: 10.3390/genes14050961.
With the growing use of high-throughput technologies, multi-omics data containing various types of high-dimensional omics data is increasingly being generated to explore the association between the molecular mechanism of the host and diseases. In this study, we present an adaptive sparse multi-block partial least square discriminant analysis (asmbPLS-DA), an extension of our previous work, asmbPLS. This integrative approach identifies the most relevant features across different types of omics data while discriminating multiple disease outcome groups. We used simulation data with various scenarios and a real dataset from the TCGA project to demonstrate that asmbPLS-DA can identify key biomarkers from each type of omics data with better biological relevance than existing competitive methods. Moreover, asmbPLS-DA showed comparable performance in the classification of subjects in terms of disease status or phenotypes using integrated multi-omics molecular profiles, especially when combined with other classification algorithms, such as linear discriminant analysis and random forest. We have made the R package called that implements this method publicly available on GitHub. Overall, asmbPLS-DA achieved competitive performance in terms of feature selection and classification. We believe that asmbPLS-DA can be a valuable tool for multi-omics research.
随着高通量技术的应用日益广泛,越来越多的多组学数据(包含各种类型的高维组学数据)被用于探索宿主分子机制与疾病之间的关联。在本研究中,我们提出了一种自适应稀疏多块偏最小二乘判别分析(asmbPLS-DA),这是我们之前工作 asmbPLS 的扩展。这种综合方法可以在区分多个疾病结果组的同时,识别不同类型组学数据之间最相关的特征。我们使用了具有各种场景的模拟数据和来自 TCGA 项目的真实数据集,证明了 asmbPLS-DA 可以从每种类型的组学数据中识别出关键的生物标志物,其具有比现有竞争方法更好的生物学相关性。此外,asmbPLS-DA 还使用整合的多组学分子谱在疾病状态或表型方面对受试者进行分类方面表现出可比的性能,尤其是与其他分类算法(如线性判别分析和随机森林)结合使用时。我们已经在 GitHub 上公开了一个名为 的 R 包,该包实现了这种方法。总体而言,asmbPLS-DA 在特征选择和分类方面表现出了竞争力。我们相信 asmbPLS-DA 可以成为多组学研究的有用工具。