School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.
Computational Systems Immunology, Pfizer Worldwide R&D, Cambridge, MA, USA.
Bioinformatics. 2019 Sep 15;35(18):3263-3272. doi: 10.1093/bioinformatics/btz112.
Patient stratification methods are key to the vision of precision medicine. Here, we consider transcriptional data to segment the patient population into subsets relevant to a given phenotype. Whereas most existing patient stratification methods focus either on predictive performance or interpretable features, we developed a method striking a balance between these two important goals.
We introduce a Bayesian method called SUBSTRA that uses regularized biclustering to identify patient subtypes and interpretable subtype-specific transcript clusters. The method iteratively re-weights feature importance to optimize phenotype prediction performance by producing more phenotype-relevant patient subtypes. We investigate the performance of SUBSTRA in finding relevant features using simulated data and successfully benchmark it against state-of-the-art unsupervised stratification methods and supervised alternatives. Moreover, SUBSTRA achieves predictive performance competitive with the supervised benchmark methods and provides interpretable transcriptional features in diverse biological settings, such as drug response prediction, cancer diagnosis, or kidney transplant rejection.
The R code of SUBSTRA is available at https://github.com/sahandk/SUBSTRA.
Supplementary data are available at Bioinformatics online.
患者分层方法是精准医学愿景的关键。在这里,我们考虑转录数据将患者群体划分为与给定表型相关的子集。虽然大多数现有的患者分层方法要么侧重于预测性能,要么侧重于可解释的特征,但我们开发了一种在这两个重要目标之间取得平衡的方法。
我们引入了一种称为 SUBSTRA 的贝叶斯方法,该方法使用正则化双聚类来识别患者亚型和可解释的亚型特异性转录簇。该方法通过产生更多与表型相关的患者亚型来迭代地重新加权特征重要性,从而优化表型预测性能。我们使用模拟数据研究了 SUBSTRA 在寻找相关特征方面的性能,并成功地将其与最先进的无监督分层方法和监督替代方法进行了基准测试。此外,SUBSTRA 在各种生物学环境中实现了与监督基准方法具有竞争力的预测性能,并提供了可解释的转录特征,例如药物反应预测、癌症诊断或肾移植排斥。
SUBSTRA 的 R 代码可在 https://github.com/sahandk/SUBSTRA 上获得。
补充数据可在生物信息学在线获得。