Suppr超能文献

用于健康生物标志物分类的可解释对数对比:一种平衡选择的新方法。

Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection.

作者信息

Quinn Thomas P, Erb Ionas

机构信息

Independent Scientist, Geelong, Australia

Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.

出版信息

mSystems. 2020 Apr 7;5(2):e00230-19. doi: 10.1128/mSystems.00230-19.

Abstract

Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization. High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.

摘要

自世纪之交以来,技术进步使得以经济高效的方式获取任何组织的分子图谱成为可能。这些进步包括先进的高通量检测方法,可测量微生物、RNA分子和代谢物的相对丰度。虽然收集这些数据大多是为了深入了解生物系统,但它们也可用作生物标志物来创建临床有用的诊断分类器。如何最好地对高维组学数据进行分类仍是一个活跃的研究领域。然而,很少有方法明确对这些数据的相对性质进行建模,而是依赖繁琐的归一化处理。本报告(i)强调健康生物标志物的相对性质,(ii)讨论围绕相对数据分类的文献,以及(iii)对多种生物标志物类型的正则化逻辑回归中不同变换的性能进行基准测试。我们展示了一组可解释的对数对比(称为平衡)如何为分类准备数据。我们提出了一种简单的程序,称为判别平衡分析,以选择能够共同区分实验条件的2种和3种细菌组合。判别平衡分析是一种快速、准确且可解释的数据归一化替代方法。高通量测序提供了一种简单且经济高效的方法来测量任何环境或生物样本中细菌的相对丰度。当这些样本来自人类时,微生物组特征可作为疾病预测的生物标志物。然而,由于细菌丰度是以组成形式测量的,这些数据具有独特的性质,使得传统分析并不适用。为了克服这一点,分析人员通常使用繁琐的归一化方法。本文提出了一种替代方法,该方法识别其化学计量存在能够区分患病和未患病样本的细菌对和细菌三元组。通过使用称为平衡的可解释对数对比,我们开发了一种完全无需归一化的分类程序,该程序在不牺牲分类器性能的情况下减少了特征空间并提高了可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a83/7141889/30176be0bc38/mSystems.00230-19-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验