Suppr超能文献

多组元 TGDR 是一种多类正则化方法,用于识别乙型肝炎或丙型肝炎病毒感染所致肝细胞癌和肝硬化的代谢特征。

Multi-TGDR, a multi-class regularization method, identifies the metabolic profiles of hepatocellular carcinoma and cirrhosis infected with hepatitis B or hepatitis C virus.

机构信息

Division of Clinical Epidemiology, First Hospital of the Jilin University, 71Xinmin Street, Changchun, Jilin 130021, China.

出版信息

BMC Bioinformatics. 2014 Apr 4;15:97. doi: 10.1186/1471-2105-15-97.

Abstract

BACKGROUND

Over the last decade, metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally. Like other "omics" data, metabolomics data has the characteristics of a smaller sample size compared to the number of features evaluated. Thus the selection of an optimal subset of features with a supervised classifier is imperative. We extended an existing feature selection algorithm, threshold gradient descent regularization (TGDR), to handle multi-class classification of "omics" data, and proposed two such extensions referred to as multi-TGDR. Both multi-TGDR frameworks were used to analyze a metabolomics dataset that compares the metabolic profiles of hepatocellular carcinoma (HCC) infected with hepatitis B (HBV) or C virus (HCV) with that of cirrhosis induced by HBV/HCV infection; the goal was to improve early-stage diagnosis of HCC.

RESULTS

We applied two multi-TGDR frameworks to the HCC metabolomics data that determined TGDR thresholds either globally across classes, or locally for each class. Multi-TGDR global model selected 45 metabolites with a 0% misclassification rate (the error rate on the training data) and had a 3.82% 5-fold cross-validation (CV-5) predictive error rate. Multi-TGDR local selected 48 metabolites with a 0% misclassification rate and a 5.34% CV-5 error rate.

CONCLUSIONS

One important advantage of multi-TGDR local is that it allows inference for determining which feature is related specifically to the class/classes. Thus, we recommend multi-TGDR local be used because it has similar predictive performance and requires the same computing time as multi-TGDR global, but may provide class-specific inference.

摘要

背景

在过去的十年中,代谢组学已经发展成为一个被许多全球实验室使用的主流领域。与其他“组学”数据一样,代谢组学数据的特点是评估的特征数量比样本量小。因此,选择具有监督分类器的最佳特征子集是至关重要的。我们扩展了现有的特征选择算法,阈值梯度下降正则化(TGDR),以处理“组学”数据的多类分类,并提出了两种扩展,称为多-TGDR。这两种多-TGDR 框架都用于分析一个代谢组学数据集,该数据集比较了乙型肝炎(HBV)或丙型肝炎(HCV)感染的肝细胞癌(HCC)的代谢谱与 HBV/HCV 感染引起的肝硬化的代谢谱;目标是改善 HCC 的早期诊断。

结果

我们将两种多-TGDR 框架应用于 HCC 代谢组学数据,这些框架确定了要么在全局范围内跨类别的 TGDR 阈值,要么在每个类别的局部范围内确定 TGDR 阈值。多-TGDR 全局模型选择了 45 个代谢物,其错误率(训练数据上的错误率)为 0%,5 倍交叉验证(CV-5)预测错误率为 3.82%。多-TGDR 局部选择了 48 个代谢物,错误率为 0%,CV-5 错误率为 5.34%。

结论

多-TGDR 局部的一个重要优势是它允许进行推断,以确定哪个特征与特定的类/类有关。因此,我们建议使用多-TGDR 局部,因为它具有相似的预测性能,并且需要与多-TGDR 全局相同的计算时间,但可能提供类特定的推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fe6/4234477/14e42090a8f7/1471-2105-15-97-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验