• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用稳定性标准选择特征选择方法可在微生物组数据中产生可重复的结果。

Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data.

机构信息

Division of Biostatistics, University of California San Diego, La Jolla, California, USA.

IBM T. J. Watson Research Center, Yorktown Heights, New York, USA.

出版信息

Biometrics. 2022 Sep;78(3):1155-1167. doi: 10.1111/biom.13481. Epub 2021 May 19.

DOI:10.1111/biom.13481
PMID:33914902
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9787628/
Abstract

Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.

摘要

特征选择在微生物组数据分析中不可或缺,但由于微生物组数据集具有高维、欠定、稀疏和组成性等特点,特征选择可能特别具有挑战性。最近,人们在开发新的特征选择方法方面做出了巨大努力,这些方法可以处理上述数据特征,但几乎所有方法都是基于模型预测的性能进行评估的。然而,很少有人关注解决一个基本问题:这些评估标准是否合适?大多数特征选择方法通常控制模型拟合,但仅基于预测准确性,无法评估识别有意义的特征子集的能力。如果数据的微小变化会导致所选特征子集的大幅变化,那么许多选定的特征很可能是数据伪影,而不是真实的生物信号。这种识别相关和可重复特征的关键需求促使我们提出了稳定性等可重复性评估标准,它量化了方法对数据扰动的稳健性。在我们的论文中,我们在连续或二进制结果的模拟和实验微生物组应用中,比较了流行的模型预测指标(MSE 或 AUC)与我们提出的稳定性可重复性评估标准,评估了四种广泛使用的特征选择方法的性能。我们得出的结论是,稳定性是比模型预测指标更优的特征选择标准,因为它更好地量化了特征选择方法的可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cba7/9787628/0be207a2b6ad/BIOM-78-1155-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cba7/9787628/eb591596978b/BIOM-78-1155-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cba7/9787628/1ee14bedead7/BIOM-78-1155-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cba7/9787628/0be207a2b6ad/BIOM-78-1155-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cba7/9787628/eb591596978b/BIOM-78-1155-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cba7/9787628/1ee14bedead7/BIOM-78-1155-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cba7/9787628/0be207a2b6ad/BIOM-78-1155-g001.jpg

相似文献

1
Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data.利用稳定性标准选择特征选择方法可在微生物组数据中产生可重复的结果。
Biometrics. 2022 Sep;78(3):1155-1167. doi: 10.1111/biom.13481. Epub 2021 May 19.
2
Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery.基于化学计量学的特征选择方法在早期癌症检测和生物标志物发现中的稳健性。
Stat Appl Genet Mol Biol. 2013 Mar 13;12(2):207-23. doi: 10.1515/sagmb-2012-0067.
3
Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms.基于成本的二进制分类特征选择:贪婪前向选择和遗传算法的改进。
BMC Bioinformatics. 2020 Jan 28;21(1):26. doi: 10.1186/s12859-020-3361-9.
4
Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.用于临床预测的稳定特征选择:利用树套索法挖掘国际疾病分类树结构
J Biomed Inform. 2015 Feb;53:277-90. doi: 10.1016/j.jbi.2014.11.013. Epub 2014 Dec 9.
5
A comparative study on feature selection for a risk prediction model for colorectal cancer.用于结直肠癌风险预测模型的特征选择的比较研究。
Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4.
6
A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.一种用于为高维数据寻找具有稳定特征选择的预测性稀疏模型的多准则方法。
Comput Math Methods Med. 2017;2017:7907163. doi: 10.1155/2017/7907163. Epub 2017 Aug 1.
7
A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses.对基于集成的高维回归在微生物组分析中探索大型模型空间的系统评估。
BMC Bioinformatics. 2015 Feb 1;16:31. doi: 10.1186/s12859-015-0467-6.
8
Minimax sparse logistic regression for very high-dimensional feature selection.极小极大稀疏逻辑回归在超高维特征选择中的应用。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1609-22. doi: 10.1109/TNNLS.2013.2263427.
9
A Novel Sparse Compositional Technique Reveals Microbial Perturbations.一种新型稀疏合成技术揭示了微生物扰动。
mSystems. 2019 Feb 12;4(1). doi: 10.1128/mSystems.00016-19. eCollection 2019 Jan-Feb.
10
Bayesian compositional regression with structured priors for microbiome feature selection.基于结构先验的贝叶斯组合回归在微生物组特征选择中的应用。
Biometrics. 2021 Sep;77(3):824-838. doi: 10.1111/biom.13335. Epub 2020 Jul 31.

引用本文的文献

1
PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts.预讲座:患病率利用一致的特征选择来解码不同队列中的微生物特征。
NPJ Biofilms Microbiomes. 2025 Jan 3;11(1):3. doi: 10.1038/s41522-024-00598-2.
2
Development and evaluation of statistical and artificial intelligence approaches with microbial shotgun metagenomics data as an untargeted screening tool for use in food production.发展和评估基于微生物鸟枪法宏基因组学数据的统计和人工智能方法,将其作为用于食品生产的非靶向筛选工具。
mSystems. 2024 Nov 19;9(11):e0084024. doi: 10.1128/msystems.00840-24. Epub 2024 Oct 10.

本文引用的文献

1
Vitamin D metabolites and the gut microbiome in older men.维生素 D 代谢物与老年男性的肠道微生物组。
Nat Commun. 2020 Nov 26;11(1):5997. doi: 10.1038/s41467-020-19793-8.
2
Microbiome analyses of blood and tissues suggest cancer diagnostic approach.血液和组织的微生物组分析提示癌症诊断方法。
Nature. 2020 Mar;579(7800):567-574. doi: 10.1038/s41586-020-2095-1. Epub 2020 Mar 11.
3
Machine learning methods for microbiome studies.微生物组研究中的机器学习方法。
J Microbiol. 2020 Mar;58(3):206-216. doi: 10.1007/s12275-020-0066-8. Epub 2020 Feb 27.
4
Human Skin, Oral, and Gut Microbiomes Predict Chronological Age.人类皮肤、口腔和肠道微生物群可预测实际年龄。
mSystems. 2020 Feb 11;5(1):e00630-19. doi: 10.1128/mSystems.00630-19.
5
Generalized linear models with linear constraints for microbiome compositional data.用于微生物组组成数据的具有线性约束的广义线性模型。
Biometrics. 2019 Mar;75(1):235-244. doi: 10.1111/biom.12956. Epub 2018 Aug 10.
6
Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research.鉴定和克服微生物组研究中可重复性、可复制性、稳健性和泛化能力的威胁。
mBio. 2018 Jun 5;9(3):e00525-18. doi: 10.1128/mBio.00525-18.
7
Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models.微生物组数据使用随机森林回归模型准确预测死后间隔时间。
Genes (Basel). 2018 Feb 16;9(2):104. doi: 10.3390/genes9020104.
8
Meta-analysis of gut microbiome studies identifies disease-specific and shared responses.基于宏基因组关联研究的肠道微生物组分析鉴定出疾病特异性和共享反应。
Nat Commun. 2017 Dec 5;8(1):1784. doi: 10.1038/s41467-017-01973-8.
9
A communal catalogue reveals Earth's multiscale microbial diversity.一份公共目录揭示了地球的多尺度微生物多样性。
Nature. 2017 Nov 23;551(7681):457-463. doi: 10.1038/nature24621. Epub 2017 Nov 1.
10
A phylogenetic transform enhances analysis of compositional microbiota data.系统发育转换可增强对微生物群落组成数据的分析。
Elife. 2017 Feb 15;6:e21887. doi: 10.7554/eLife.21887.