• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种广义协变量调整的最优配对算法及其在慢性肾功能不全队列研究(CRIC)中糖尿病肾病分期分类中的应用。

A generalized covariate-adjusted top-scoring pair algorithm with applications to diabetic kidney disease stage classification in the Chronic Renal Insufficiency Cohort (CRIC) Study.

机构信息

Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA.

Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.

出版信息

BMC Bioinformatics. 2023 Feb 20;24(1):57. doi: 10.1186/s12859-023-05171-w.

DOI:10.1186/s12859-023-05171-w
PMID:36803209
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9942303/
Abstract

BACKGROUND

The growing amount of high dimensional biomolecular data has spawned new statistical and computational models for risk prediction and disease classification. Yet, many of these methods do not yield biologically interpretable models, despite offering high classification accuracy. An exception, the top-scoring pair (TSP) algorithm derives parameter-free, biologically interpretable single pair decision rules that are accurate and robust in disease classification. However, standard TSP methods do not accommodate covariates that could heavily influence feature selection for the top-scoring pair. Herein, we propose a covariate-adjusted TSP method, which uses residuals from a regression of features on the covariates for identifying top scoring pairs. We conduct simulations and a data application to investigate our method, and compare it to existing classifiers, LASSO and random forests.

RESULTS

Our simulations found that features that were highly correlated with clinical variables had high likelihood of being selected as top scoring pairs in the standard TSP setting. However, through residualization, our covariate-adjusted TSP was able to identify new top scoring pairs, that were largely uncorrelated with clinical variables. In the data application, using patients with diabetes (n = 977) selected for metabolomic profiling in the Chronic Renal Insufficiency Cohort (CRIC) study, the standard TSP algorithm identified (valine-betaine, dimethyl-arg) as the top-scoring metabolite pair for classifying diabetic kidney disease (DKD) severity, whereas the covariate-adjusted TSP method identified the pair (pipazethate, octaethylene glycol) as top-scoring. Valine-betaine and dimethyl-arg had, respectively, ≥ 0.4 absolute correlation with urine albumin and serum creatinine, known prognosticators of DKD. Thus without covariate-adjustment the top-scoring pair largely reflected known markers of disease severity, whereas covariate-adjusted TSP uncovered features liberated from confounding, and identified independent prognostic markers of DKD severity. Furthermore, TSP-based methods achieved competitive classification accuracy in DKD to LASSO and random forests, while providing more parsimonious models.

CONCLUSIONS

We extended TSP-based methods to account for covariates, via a simple, easy to implement residualizing process. Our covariate-adjusted TSP method identified metabolite features, uncorrelated from clinical covariates, that discriminate DKD severity stage based on the relative ordering between two features, and thus provide insights into future studies on the order reversals in early vs advanced disease states.

摘要

背景

随着高维生物分子数据的不断增加,为风险预测和疾病分类已经开发了新的统计和计算模型。然而,尽管这些方法的分类精度很高,但它们并没有产生可生物解释的模型。一个例外是,最高分对(TSP)算法衍生出参数免费、可生物解释的单一配对决策规则,在疾病分类中既准确又稳健。然而,标准的 TSP 方法不适应可能严重影响最优配对特征选择的协变量。在这里,我们提出了一种协变量调整的 TSP 方法,该方法使用特征对协变量进行回归的残差来识别最优配对。我们进行了模拟和数据应用研究,将其与现有的分类器 LASSO 和随机森林进行了比较。

结果

我们的模拟发现,与临床变量高度相关的特征在标准 TSP 环境中很有可能被选为最优配对。然而,通过残差化,我们的协变量调整的 TSP 能够识别出与临床变量基本无关的新的最优配对。在数据应用中,使用慢性肾功能不全队列(CRIC)研究中选择进行代谢组学分析的糖尿病患者(n=977),标准 TSP 算法确定(缬氨酸-甜菜碱、二甲基精氨酸)是用于分类糖尿病肾病(DKD)严重程度的最佳配对代谢物,而协变量调整的 TSP 方法则确定了(哌嗪噻嗪、辛基二醇)是最佳配对。缬氨酸-甜菜碱和二甲基精氨酸与尿白蛋白和血清肌酐分别具有≥0.4的绝对相关性,这是 DKD 的已知预后标志物。因此,如果不进行协变量调整,最优配对主要反映了疾病严重程度的已知标志物,而协变量调整的 TSP 则揭示了不受混杂因素影响的特征,并确定了 DKD 严重程度的独立预后标志物。此外,基于 TSP 的方法在 DKD 中实现了与 LASSO 和随机森林相当的分类准确性,同时提供了更简洁的模型。

结论

我们通过一个简单、易于实现的残差化过程,将基于 TSP 的方法扩展到可以考虑协变量。我们的协变量调整 TSP 方法确定了与临床协变量无关的代谢物特征,根据两个特征之间的相对顺序来区分 DKD 严重程度阶段,从而为研究早期与晚期疾病状态之间的顺序逆转提供了思路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/284955dc73ec/12859_2023_5171_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/213101cf040a/12859_2023_5171_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/496eac58b871/12859_2023_5171_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/83b16c36bd1d/12859_2023_5171_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/d81592df015c/12859_2023_5171_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/284955dc73ec/12859_2023_5171_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/213101cf040a/12859_2023_5171_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/496eac58b871/12859_2023_5171_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/83b16c36bd1d/12859_2023_5171_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/d81592df015c/12859_2023_5171_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e469/9942303/284955dc73ec/12859_2023_5171_Fig5_HTML.jpg

相似文献

1
A generalized covariate-adjusted top-scoring pair algorithm with applications to diabetic kidney disease stage classification in the Chronic Renal Insufficiency Cohort (CRIC) Study.一种广义协变量调整的最优配对算法及其在慢性肾功能不全队列研究(CRIC)中糖尿病肾病分期分类中的应用。
BMC Bioinformatics. 2023 Feb 20;24(1):57. doi: 10.1186/s12859-023-05171-w.
2
High-Throughput Metabolomics and Diabetic Kidney Disease Progression: Evidence from the Chronic Renal Insufficiency (CRIC) Study.高通量代谢组学与糖尿病肾病进展:来自慢性肾功能不全(CRIC)研究的证据。
Am J Nephrol. 2022;53(2-3):215-225. doi: 10.1159/000521940. Epub 2022 Feb 23.
3
Metabolomic Markers of Kidney Function Decline in Patients With Diabetes: Evidence From the Chronic Renal Insufficiency Cohort (CRIC) Study.糖尿病患者肾功能下降的代谢组学标志物:来自慢性肾功能不全队列(CRIC)研究的证据。
Am J Kidney Dis. 2020 Oct;76(4):511-520. doi: 10.1053/j.ajkd.2020.01.019. Epub 2020 May 5.
4
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.机器学习中特征选择的最佳评分对及其在癌症预后预测中的应用。
BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375.
5
TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection.TSG:一种用于二分类和多分类癌症分类及信息基因选择的新算法。
BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1755-8794-6-S1-S3. Epub 2013 Jan 23.
6
A new data analysis method based on feature linear combination.基于特征线性组合的新数据分析方法。
J Biomed Inform. 2019 Jun;94:103173. doi: 10.1016/j.jbi.2019.103173. Epub 2019 Apr 6.
7
Machine learning algorithms for diabetic kidney disease risk predictive model of Chinese patients with type 2 diabetes mellitus.用于中国2型糖尿病患者糖尿病肾病风险预测模型的机器学习算法
Ren Fail. 2025 Dec;47(1):2486558. doi: 10.1080/0886022X.2025.2486558. Epub 2025 Apr 7.
8
AUCTSP: an improved biomarker gene pair class predictor.AUCTSP:一种改进的生物标志物基因对分类预测器。
BMC Bioinformatics. 2018 Jun 26;19(1):244. doi: 10.1186/s12859-018-2231-1.
9
A modified k-TSP algorithm and its application in LC-MS-based metabolomics study of hepatocellular carcinoma and chronic liver diseases.一种改进的 k-TSP 算法及其在基于 LC-MS 的肝癌和慢性肝病代谢组学研究中的应用。
J Chromatogr B Analyt Technol Biomed Life Sci. 2014 Sep 1;966:100-8. doi: 10.1016/j.jchromb.2014.05.044. Epub 2014 Jun 2.
10
Considerations for feature selection using gene pairs and applications in large-scale dataset integration, novel oncogene discovery, and interpretable cancer screening.考虑使用基因对进行特征选择,并将其应用于大规模数据集整合、新癌基因发现和可解释性癌症筛查。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):148. doi: 10.1186/s12920-020-00778-x.

本文引用的文献

1
Metabolomic Markers of Kidney Function Decline in Patients With Diabetes: Evidence From the Chronic Renal Insufficiency Cohort (CRIC) Study.糖尿病患者肾功能下降的代谢组学标志物:来自慢性肾功能不全队列(CRIC)研究的证据。
Am J Kidney Dis. 2020 Oct;76(4):511-520. doi: 10.1053/j.ajkd.2020.01.019. Epub 2020 May 5.
2
Plausible diagnostic value of urinary isomeric dimethylarginine ratio for diabetic nephropathy.尿同型二甲基精氨酸比值对糖尿病肾病的诊断价值。
Sci Rep. 2020 Feb 19;10(1):2970. doi: 10.1038/s41598-020-59897-1.
3
Metabolomics Approaches for the Diagnosis and Understanding of Kidney Diseases.
用于肾脏疾病诊断与理解的代谢组学方法
Metabolites. 2019 Feb 14;9(2):34. doi: 10.3390/metabo9020034.
4
Biomarkers of diabetic kidney disease.糖尿病肾病的生物标志物。
Diabetologia. 2018 May;61(5):996-1011. doi: 10.1007/s00125-018-4567-5. Epub 2018 Mar 8.
5
HMDB 4.0: the human metabolome database for 2018.HMDB 4.0:2018 年人类代谢组数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D608-D617. doi: 10.1093/nar/gkx1089.
6
Covariate adjusted classification trees.协变量调整分类树
Biostatistics. 2018 Jan 1;19(1):42-53. doi: 10.1093/biostatistics/kxx015.
7
A review on machine learning principles for multi-view biological data integration.机器学习原理在多视图生物数据集成中的研究综述。
Brief Bioinform. 2018 Mar 1;19(2):325-340. doi: 10.1093/bib/bbw113.
8
An overview of renal metabolomics.肾脏代谢组学概述。
Kidney Int. 2017 Jan;91(1):61-69. doi: 10.1016/j.kint.2016.08.021. Epub 2016 Sep 28.
9
Chronic Renal Insufficiency Cohort Study (CRIC): Overview and Summary of Selected Findings.慢性肾功能不全队列研究(CRIC):选定研究结果的概述与总结
Clin J Am Soc Nephrol. 2015 Nov 6;10(11):2073-83. doi: 10.2215/CJN.04260415. Epub 2015 Aug 11.
10
Metabolomic biomarkers in diabetic kidney diseases--A systematic review.糖尿病肾病中的代谢组学生物标志物——一项系统综述
J Diabetes Complications. 2015 Nov-Dec;29(8):1345-51. doi: 10.1016/j.jdiacomp.2015.06.016. Epub 2015 Jul 9.