Suppr超能文献

一种针对精准医学测试发现的基于辍学正则化分类器开发的方法,从组学数据中优化。

A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data.

机构信息

Biodesix Inc, 2970 Wilderness Pl, Ste100, Boulder, CO, 80301, USA.

出版信息

BMC Bioinformatics. 2019 Jun 13;20(1):325. doi: 10.1186/s12859-019-2922-2.

Abstract

BACKGROUND

Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts available for test discovery remains a challenging task. Obtaining reliable performance assessments at the earliest stages of test development can also be problematic. We describe a novel approach to classifier development designed to create clinically useful tests together with reliable estimates of their performance. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes. It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization.

RESULTS

We apply this dropout-regularized combination approach to two clinical problems in oncology using mRNA expression and associated clinical data and compare performance with other methods of classifier generation, including Random Forest. Performance of the new method is similar to or better than the Random Forest in the two classification tasks used for comparison. The dropout-regularized combination method also generates an effective classifier in a classification task with a known confounding variable. Most importantly, it provides a reliable estimate of test performance from a relatively small development set of samples.

CONCLUSIONS

The flexible dropout-regularized combination approach is able to produce tests tailored to particular clinical questions and mitigate known confounding effects. It allows the design of molecular diagnostic tests addressing particular clinical questions together with reliable assessment of whether test performance is likely to be fit-for-purpose in independent validation at the earliest stages of development.

摘要

背景

现代基因组和蛋白质组分析方法从组织和基于血液的样本中产生了大量的数据,这些数据对于改善患者护理具有潜在的应用价值。然而,从现有的可供测试发现的小样本队列中,为未满足的临床需求设计精准医学测试仍然是一项具有挑战性的任务。在测试开发的早期阶段获得可靠的性能评估也可能存在问题。我们描述了一种新的分类器开发方法,旨在创建具有临床实用价值的测试,并可靠地估计其性能。该方法结合了传统和现代机器学习的元素,以促进在样本数量少于测量患者属性数量的队列中使用。它基于分类和信息抽象的层次结构,并结合了提升、装袋和强辍学正则化。

结果

我们将这种辍学正则化组合方法应用于使用 mRNA 表达和相关临床数据的两个肿瘤学临床问题,并将性能与其他分类器生成方法进行比较,包括随机森林。在用于比较的两个分类任务中,新方法的性能与随机森林相似或更好。辍学正则化组合方法还在具有已知混杂变量的分类任务中生成了有效的分类器。最重要的是,它可以从相对较小的开发样本集中可靠地估计测试性能。

结论

灵活的辍学正则化组合方法能够针对特定的临床问题生成定制的测试,并减轻已知的混杂影响。它允许设计针对特定临床问题的分子诊断测试,并在开发的早期阶段可靠地评估测试性能是否适合独立验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d19/6567499/5acaccb64d67/12859_2019_2922_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验