Department of Chemistry , University of Kansas , Lawrence , Kansas 66045 , United States.
Anal Chem. 2019 Sep 3;91(17):11070-11077. doi: 10.1021/acs.analchem.9b01606. Epub 2019 Aug 13.
"The totality is not, as it were, a mere heap, but the whole is something besides the parts."-Aristotle. We built a classifier that uses the totality of the glycomic profile, not restricted to a few glycoforms, to differentiate samples from two different sources. This approach, which relies on using thousands of features, is a radical departure from current strategies, where most of the glycomic profile is ignored in favor of selecting a few features, or even a single feature, meant to capture the differences in sample types. The classifier can be used to differentiate the source of the material; applicable sources may be different species of animals, different protein production methods, or, most importantly, different biological states (disease vs healthy). The classifier can be used on glycomic data in any form, including derivatized monosaccharides, intact glycans, or glycopeptides. It takes advantage of the fact that changing the source material can cause a change in the glycomic profile in many subtle ways: some glycoforms can be upregulated, some downregulated, some may appear unchanged, yet their proportion-with respect to other forms present-can be altered to a detectable degree. By classifying samples using the entirety of their glycan abundances, along with the glycans' relative proportions to each other, the "Aristotle Classifier" is more effective at capturing the underlying trends than standard classification procedures used in glycomics, including PCA (principal components analysis). It also outperforms workflows where a single, representative glycomic-based biomarker is used to classify samples. We describe the Aristotle Classifier and provide several examples of its utility for biomarker studies and other classification problems using glycomic data from several sources.
“整体不是一堆东西,而是整体之外的东西。”——亚里士多德。我们构建了一个分类器,它使用糖组学特征的整体,而不是仅限于少数糖型,来区分来自两个不同来源的样本。这种方法依赖于使用数千个特征,与当前的策略有很大的不同,当前的策略忽略了大部分糖组学特征,而倾向于选择少数特征,甚至是单个特征,以捕捉样本类型的差异。该分类器可用于区分物质的来源;适用的来源可能是不同的动物物种、不同的蛋白质生产方法,或者最重要的是不同的生物状态(疾病与健康)。该分类器可用于任何形式的糖组学数据,包括衍生的单糖、完整的聚糖或糖肽。它利用了这样一个事实,即改变源材料可能会以许多微妙的方式改变糖组学特征:一些糖型可能上调,一些下调,一些可能保持不变,但它们与其他存在形式的比例可能会发生变化,达到可检测的程度。通过使用糖基化特征的整体以及它们之间的相对比例来对样本进行分类,“亚里士多德分类器”比糖组学中使用的标准分类程序(包括主成分分析)更有效地捕捉潜在趋势。它也优于使用单个代表性糖基化生物标志物对样本进行分类的工作流程。我们描述了亚里士多德分类器,并提供了几个使用来自多个来源的糖组学数据进行生物标志物研究和其他分类问题的实用示例。