Tercan Bahar, Apolonio Victor H, Chagas Vinicius S, Wong Christopher K, Lee Jordan A, Yau Christina, Benz Christopher C, Stuart Joshua M, Karlberg Brian J, Ellrott Kyle, Grewal Jasleen K, Jones Steven J M, Zenklusen Jean C, Robertson A Gordon, Laird Peter W, Cherniack Andrew D, Castro Mauro A A
Institute of Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA.
Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba, PR 81520-260, Brazil.
STAR Protoc. 2025 Jun 20;6(2):103681. doi: 10.1016/j.xpro.2025.103681. Epub 2025 Mar 18.
As genes tend to be co-regulated as gene modules, feature selection in machine learning (ML) on gene expression data can be challenged by the complexity of gene regulation. Here, we present a protocol for reconciling differences in classifier features identified using different ML approaches. We describe steps for loading the PathwaySpace R package, preparing input for analysis, and creating density plots of gene sets. We then detail procedures for testing whether apparently distinct feature sets are related in pathway space. For complete details on the use and execution of this protocol, please refer to Ellrott et al..
由于基因倾向于作为基因模块共同调控,基于基因表达数据的机器学习(ML)中的特征选择可能会受到基因调控复杂性的挑战。在这里,我们提出了一种用于协调使用不同ML方法识别的分类器特征差异的方案。我们描述了加载PathwaySpace R包、准备分析输入以及创建基因集密度图的步骤。然后,我们详细说明了测试明显不同的特征集在通路空间中是否相关的程序。有关此方案的使用和执行的完整详细信息,请参考Ellrott等人的文章。