一种针对精准医学测试发现的基于辍学正则化分类器开发的方法，从组学数据中优化。

A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data.

机构信息

Biodesix Inc, 2970 Wilderness Pl, Ste100, Boulder, CO, 80301, USA.

出版信息

BMC Bioinformatics. 2019 Jun 13;20(1):325. doi: 10.1186/s12859-019-2922-2.

DOI:10.1186/s12859-019-2922-2

PMID:31196002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6567499/

Abstract

BACKGROUND

Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts available for test discovery remains a challenging task. Obtaining reliable performance assessments at the earliest stages of test development can also be problematic. We describe a novel approach to classifier development designed to create clinically useful tests together with reliable estimates of their performance. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes. It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization.

RESULTS

We apply this dropout-regularized combination approach to two clinical problems in oncology using mRNA expression and associated clinical data and compare performance with other methods of classifier generation, including Random Forest. Performance of the new method is similar to or better than the Random Forest in the two classification tasks used for comparison. The dropout-regularized combination method also generates an effective classifier in a classification task with a known confounding variable. Most importantly, it provides a reliable estimate of test performance from a relatively small development set of samples.

CONCLUSIONS

The flexible dropout-regularized combination approach is able to produce tests tailored to particular clinical questions and mitigate known confounding effects. It allows the design of molecular diagnostic tests addressing particular clinical questions together with reliable assessment of whether test performance is likely to be fit-for-purpose in independent validation at the earliest stages of development.

摘要

背景

现代基因组和蛋白质组分析方法从组织和基于血液的样本中产生了大量的数据，这些数据对于改善患者护理具有潜在的应用价值。然而，从现有的可供测试发现的小样本队列中，为未满足的临床需求设计精准医学测试仍然是一项具有挑战性的任务。在测试开发的早期阶段获得可靠的性能评估也可能存在问题。我们描述了一种新的分类器开发方法，旨在创建具有临床实用价值的测试，并可靠地估计其性能。该方法结合了传统和现代机器学习的元素，以促进在样本数量少于测量患者属性数量的队列中使用。它基于分类和信息抽象的层次结构，并结合了提升、装袋和强辍学正则化。

结果

我们将这种辍学正则化组合方法应用于使用 mRNA 表达和相关临床数据的两个肿瘤学临床问题，并将性能与其他分类器生成方法进行比较，包括随机森林。在用于比较的两个分类任务中，新方法的性能与随机森林相似或更好。辍学正则化组合方法还在具有已知混杂变量的分类任务中生成了有效的分类器。最重要的是，它可以从相对较小的开发样本集中可靠地估计测试性能。

结论

灵活的辍学正则化组合方法能够针对特定的临床问题生成定制的测试，并减轻已知的混杂影响。它允许设计针对特定临床问题的分子诊断测试，并在开发的早期阶段可靠地评估测试性能是否适合独立验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d19/6567499/5acaccb64d67/12859_2019_2922_Fig1_HTML.jpg

相似文献

A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data.一种针对精准医学测试发现的基于辍学正则化分类器开发的方法，从组学数据中优化。

BMC Bioinformatics. 2019 Jun 13;20(1):325. doi: 10.1186/s12859-019-2922-2.

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC：一种 AUC 优化方法，用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。

Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.机器学习集成分类器在糖尿病视网膜病变早期预测中的应用。

J Med Syst. 2017 Nov 9;41(12):201. doi: 10.1007/s10916-017-0853-x.

A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction.一种通过普罗克汝斯分析和均值漂移进行癌症药物敏感性预测的迁移学习方法。

J Bioinform Comput Biol. 2018 Jun;16(3):1840014. doi: 10.1142/S0219720018400140.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels.基于基因表达水平的肺癌分类机器学习算法应用评估

Asian Pac J Cancer Prev. 2016;17(2):835-8. doi: 10.7314/apjcp.2016.17.2.835.

Big genomics and clinical data analytics strategies for precision cancer prognosis.大规模基因组学和临床数据分析策略在癌症精准预后中的应用。

Sci Rep. 2016 Nov 7;6:36493. doi: 10.1038/srep36493.

Artificial intelligence, physiological genomics, and precision medicine.人工智能、生理基因组学和精准医学。

Physiol Genomics. 2018 Apr 1;50(4):237-243. doi: 10.1152/physiolgenomics.00119.2017. Epub 2018 Jan 26.

SNRFCB: sub-network based random forest classifier for predicting chemotherapy benefit on survival for cancer treatment.SNRFCB：基于子网络的随机森林分类器，用于预测癌症治疗中化疗对生存的益处。

Mol Biosyst. 2016 Apr;12(4):1214-23. doi: 10.1039/c5mb00399g. Epub 2016 Feb 11.

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology.灵活的数据修剪可提高基于组学的个体化肿瘤学中全局机器学习方法的性能。

Int J Mol Sci. 2020 Jan 22;21(3):713. doi: 10.3390/ijms21030713.

Integration of Cancer Genomics Data for Tree-based Dimensionality Reduction and Cancer Outcome Prediction.基于树的降维和癌症预后预测的癌症基因组学数据的整合。

Mol Inform. 2020 Mar;39(3):e1900028. doi: 10.1002/minf.201900028. Epub 2019 Sep 6.

引用本文的文献

Development and validation of a serum proteomic test for predicting patient outcomes in advanced non-small cell lung cancer treated with atezolizumab or docetaxel.用于预测接受阿特珠单抗或多西他赛治疗的晚期非小细胞肺癌患者预后的血清蛋白质组学检测方法的开发与验证

J Immunother Cancer. 2025 May 21;13(5):e010578. doi: 10.1136/jitc-2024-010578.

Revealing the diagnostic value and immune infiltration of senescence-related genes in endometriosis: a combined single-cell and machine learning analysis.揭示衰老相关基因在子宫内膜异位症中的诊断价值及免疫浸润：单细胞与机器学习联合分析

Front Pharmacol. 2023 Oct 3;14:1259467. doi: 10.3389/fphar.2023.1259467. eCollection 2023.

Semi-Quantitative MALDI Measurements of Blood-Based Samples for Molecular Diagnostics.基于基质辅助激光解吸电离的血液样本半定量分子诊断检测。

Molecules. 2022 Feb 1;27(3):997. doi: 10.3390/molecules27030997.

Predicting prognosis in COVID-19 patients using machine learning and readily available clinical data.利用机器学习和现成的临床数据预测 COVID-19 患者的预后。

Int J Med Inform. 2021 Nov;155:104594. doi: 10.1016/j.ijmedinf.2021.104594. Epub 2021 Sep 23.

Explaining multivariate molecular diagnostic tests via Shapley values.通过 Shapley 值解释多变量分子诊断测试。

BMC Med Inform Decis Mak. 2021 Jul 8;21(1):211. doi: 10.1186/s12911-021-01569-9.

Detection of Hepatocellular Carcinoma in a High-Risk Population by a Mass Spectrometry-Based Test.基于质谱检测的高危人群肝细胞癌检测

Cancers (Basel). 2021 Jun 22;13(13):3109. doi: 10.3390/cancers13133109.

Robust Distance Measures for NN Classification of Cancer Data.用于癌症数据神经网络分类的稳健距离度量

Cancer Inform. 2020 Oct 13;19:1176935120965542. doi: 10.1177/1176935120965542. eCollection 2020.

Definition and Independent Validation of a Proteomic-Classifier in Ovarian Cancer.卵巢癌蛋白质组分类器的定义及独立验证

Cancers (Basel). 2020 Sep 4;12(9):2519. doi: 10.3390/cancers12092519.

Mass Spectrometry-Based Multivariate Proteomic Tests for Prediction of Outcomes on Immune Checkpoint Blockade Therapy: The Modern Analytical Approach.基于质谱的多变量蛋白质组学检测在免疫检查点阻断治疗中的预测结果：现代分析方法。

Int J Mol Sci. 2020 Jan 28;21(3):838. doi: 10.3390/ijms21030838.

Extending the information content of the MALDI analysis of biological fluids via multi-million shot analysis.通过数百万次分析扩展生物体液 MALDI 分析的信息含量。

PLoS One. 2019 Dec 9;14(12):e0226012. doi: 10.1371/journal.pone.0226012. eCollection 2019.

本文引用的文献

Proteomic test for anti-PD-1 checkpoint blockade treatment of metastatic melanoma with and without BRAF mutations.抗 PD-1 检查点阻断治疗伴或不伴 BRAF 突变的转移性黑色素瘤的蛋白质组学检测。

J Immunother Cancer. 2019 Mar 29;7(1):91. doi: 10.1186/s40425-019-0569-1.

On the overestimation of random forest's out-of-bag error.随机森林的袋外误差高估问题。

PLoS One. 2018 Aug 6;13(8):e0201904. doi: 10.1371/journal.pone.0201904. eCollection 2018.

Random forest versus logistic regression: a large-scale benchmark experiment.随机森林与逻辑回归：大规模基准实验。

BMC Bioinformatics. 2018 Jul 17;19(1):270. doi: 10.1186/s12859-018-2264-5.

Deep Learning and Its Applications in Biomedicine.深度学习及其在生物医学中的应用。

Genomics Proteomics Bioinformatics. 2018 Feb;16(1):17-32. doi: 10.1016/j.gpb.2017.07.003. Epub 2018 Mar 6.

A Serum Protein Signature Associated with Outcome after Anti-PD-1 Therapy in Metastatic Melanoma.与抗 PD-1 治疗转移性黑色素瘤患者预后相关的血清蛋白标志物。

Cancer Immunol Res. 2018 Jan;6(1):79-86. doi: 10.1158/2326-6066.CIR-17-0412. Epub 2017 Dec 5.

Statistical controversies in clinical research: prognostic gene signatures are not (yet) useful in clinical practice.临床研究中的统计学争议：预后基因特征目前在临床实践中并无用处。

Ann Oncol. 2016 Dec;27(12):2160-2167. doi: 10.1093/annonc/mdw307. Epub 2016 Sep 15.

The national biomarker development alliance: confronting the poor productivity of biomarker research and development.国家生物标志物开发联盟：应对生物标志物研发生产力低下的问题。

Expert Rev Mol Diagn. 2015 Feb;15(2):211-8. doi: 10.1586/14737159.2015.974561. Epub 2014 Nov 25.

Validation of a histology-independent prognostic gene signature for early-stage, non-small-cell lung cancer including stage IA patients.验证一种组织学独立的早期非小细胞肺癌预后基因标志物，包括 IA 期患者。

J Thorac Oncol. 2014 Jan;9(1):59-64. doi: 10.1097/JTO.0000000000000042.

Next-generation sequencing and microarray-based interrogation of microRNAs from formalin-fixed, paraffin-embedded tissue: preliminary assessment of cross-platform concordance.基于下一代测序和微阵列的福尔马林固定、石蜡包埋组织中小 miRNA 的检测：不同平台间一致性的初步评估。

Genomics. 2013 Jul;102(1):8-14. doi: 10.1016/j.ygeno.2013.03.008. Epub 2013 Apr 3.

A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients.一个 12 基因集可预测非小细胞肺癌患者辅助化疗的生存获益。

Clin Cancer Res. 2013 Mar 15;19(6):1577-86. doi: 10.1158/1078-0432.CCR-12-2321. Epub 2013 Jan 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种针对精准医学测试发现的基于辍学正则化分类器开发的方法，从组学数据中优化。

A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献