Omar Mohamed, Dinalankara Wikum, Mulder Lotte, Coady Tendai, Zanettini Claudio, Imada Eddie Luidy, Younes Laurent, Geman Donald, Marchionni Luigi
Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA.
Technical University Delft, 2628 CD Delft, the Netherlands.
iScience. 2023 Feb 2;26(3):106108. doi: 10.1016/j.isci.2023.106108. eCollection 2023 Mar 17.
Many gene signatures have been developed by applying machine learning (ML) on profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance. Here, we show the importance of embedding prior biological knowledge in the decision rules yielded by ML approaches to build robust classifiers. We tested this by applying different ML algorithms on gene expression data to predict three difficult cancer phenotypes: bladder cancer progression to muscle-invasive disease, response to neoadjuvant chemotherapy in triple-negative breast cancer, and prostate cancer metastatic progression. We developed two sets of classifiers: , by restricting the training to features capturing specific biological mechanisms; and , in which the training did not use any biological information. Mechanistic models had a similar or better testing performance than their agnostic counterparts, with enhanced interpretability. Our findings support the use of biological constraints to develop robust gene signatures with high translational potential.
通过对基因谱应用机器学习(ML),已经开发出了许多基因特征,然而,它们的临床实用性常常受到可解释性有限和性能不稳定的阻碍。在此,我们展示了将先验生物学知识嵌入到机器学习方法产生的决策规则中以构建稳健分类器的重要性。我们通过对基因表达数据应用不同的机器学习算法来预测三种难治性癌症表型进行了测试:膀胱癌进展为肌层浸润性疾病、三阴性乳腺癌对新辅助化疗的反应以及前列腺癌转移进展。我们开发了两组分类器:一组通过将训练限制在捕获特定生物学机制的特征上;另一组在训练中不使用任何生物学信息。机制模型比其无先验信息的对应模型具有相似或更好的测试性能,且可解释性增强。我们的研究结果支持利用生物学约束来开发具有高转化潜力的稳健基因特征。