Suppr超能文献

在生物标志物发现中动态纳入来自多个领域的先验知识。

Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery.

机构信息

College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.

Intel Corporation, Chandler, AZ, 85226, USA.

出版信息

BMC Bioinformatics. 2020 Mar 11;21(Suppl 2):77. doi: 10.1186/s12859-020-3344-x.

Abstract

BACKGROUND

In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection.

RESULTS

Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies.

CONCLUSIONS

Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.

摘要

背景

在生物标志物发现中,应用领域知识是消除假阳性特征、优先考虑功能上有影响的标记物并促进预测特征解释的有效方法。已经开发了几种计算方法,将基于知识的生物标志物发现制定为特征选择问题,由先验信息指导。这些方法通常要求将先验信息编码为单个分数,并针对特定类型的生物学知识对算法进行优化。然而,在实践中,来自不同资源的领域知识可以提供互补信息。但是,目前没有方法可以为生物标志物发现整合异构的先验信息。为了解决这个问题,我们开发了 Know-GRRF(基于知识的正则随机森林)方法,该方法能够动态地将来自多个学科的领域知识纳入特征选择中。

结果

Know-GRRF 将领域知识嵌入到正则随机森林框架中。它将来自多个领域的先验信息结合到一个线性模型中,以得出一个综合分数,该分数与其他调整参数一起控制随机森林模型的正则化。Know-GRRF 同时优化了赋予每种类型的领域知识和其他调整参数的权重,以最小化袋外预测的 AIC。目标是选择一个具有高判别力且与生物表型具有强功能相关性的紧凑特征子集。通过严格的模拟,我们表明,由多领域先验信息指导的 Know-GRRF 优于由单领域先验信息或无先验信息指导的特征选择方法。然后,我们将 Know-GRRF 应用于真实世界的研究,以识别前列腺癌的预后生物标志物。我们评估了癌症相关基因注释、进化保守性和预计算统计分数的组合作为先验知识来组装生物标志物面板。我们发现了一组具有显著提高预测准确性的紧凑生物标志物。

结论

Know-GRRF 是一种用于特征选择的强大的新方法,可以整合来自多个领域的知识。它在生物标志物发现中有广泛的应用。我们实现了这种方法,并在 R/CRAN 档案中发布了一个 KnowGRRF 包。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9751/7068914/8f82a66ac69e/12859_2020_3344_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验