Suppr
超能文献

将分类算法扩展到病例对照研究。

Extending Classification Algorithms to Case-Control Studies.

作者信息

Stanfill Bryan, Reehl Sarah, Bramer Lisa, Nakayasu Ernesto S, Rich Stephen S, Metz Thomas O, Rewers Marian, Webb-Robertson Bobbie-Jo

机构信息

Computing and Analytics Division, National Security Directorate, Pacific Northwest National Laboratory, Richland, WA, USA.

Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA.

出版信息

Biomed Eng Comput Biol. 2019 Jul 15;10:1179597219858954. doi: 10.1177/1179597219858954. eCollection 2019.

DOI:10.1177/1179597219858954

PMID:31320812

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6630079/

Abstract

Classification is a common technique applied to 'omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated 'omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.

摘要

分类是应用于“组学”数据以构建预测模型并识别生物医学结果潜在标志物的常用技术。尽管病例对照研究很普遍，但可用于分析此类研究产生的数据的分类方法数量极其有限。条件逻辑回归是最常用的技术，但其相关的建模假设限制了它识别一大类足够复杂的“组学”特征的能力。我们提出了一个数据预处理步骤，该步骤进行了推广，使任何线性或非线性分类算法，即使是那些通常不适用于匹配设计数据的算法，都可用于对病例对照数据进行建模并识别这些研究设计中的相关生物标志物。我们在模拟的病例对照数据上证明，应用此处理步骤后，每种方法的分类和变量选择准确性都得到了提高，并且所提出的方法与现有变量选择方法相当或更胜一筹。最后，我们展示了条件分类算法对一项关于胰岛自身免疫儿童的大型队列研究的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970a/6630079/830554a2388e/10.1177_1179597219858954-fig1.jpg

相似文献

Extending Classification Algorithms to Case-Control Studies.

Biomed Eng Comput Biol. 2019 Jul 15;10:1179597219858954. doi: 10.1177/1179597219858954. eCollection 2019.

VSOLassoBag: a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research.

J Genet Genomics. 2023 Mar;50(3):151-162. doi: 10.1016/j.jgg.2022.12.005. Epub 2023 Jan 3.

A robust data scaling algorithm to improve classification accuracies in biomedical data.

BMC Bioinformatics. 2016 Sep 9;17(1):359. doi: 10.1186/s12859-016-1236-x.

Systematic Comparison of the Influence of Different Data Preprocessing Methods on the Performance of Gait Classifications Using Machine Learning.

Front Bioeng Biotechnol. 2020 Apr 15;8:260. doi: 10.3389/fbioe.2020.00260. eCollection 2020.

A Comparative Performance Assessment of Optimized Multilevel Ensemble Learning Model with Existing Classifier Models.

Big Data. 2022 Oct;10(5):371-387. doi: 10.1089/big.2021.0257. Epub 2021 Dec 8.

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.

BMC Bioinformatics. 2018 Nov 19;19(1):432. doi: 10.1186/s12859-018-2451-4.

Comparison of data augmentation and classification algorithms based on plastic spectroscopy.

Anal Methods. 2025 Feb 6;17(6):1236-1251. doi: 10.1039/d4ay01759e.

Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity.

Genes (Basel). 2023 Jan 18;14(2):248. doi: 10.3390/genes14020248.

Identification of biomarkers for risk stratification of cardiovascular events using genetic algorithm with recursive local floating search.

Proteomics. 2009 Apr;9(8):2286-94. doi: 10.1002/pmic.200700867.

Identifying optimal biomarker combinations for treatment selection through randomized controlled trials.

Clin Trials. 2015 Aug;12(4):348-56. doi: 10.1177/1740774515580126. Epub 2015 May 6.

引用本文的文献

Pre-diagnostic serum metabolome and breast cancer risk: a nested case-control study.

Breast Cancer Res. 2025 Aug 27;27(1):156. doi: 10.1186/s13058-025-02102-w.

PEDI: Towards Efficient Pathway Enrichment and Data Integration in Bioinformatics for Healthcare Using Deep Learning Optimisation.

Biomed Eng Comput Biol. 2025 Feb 28;16:11795972251321684. doi: 10.1177/11795972251321684. eCollection 2025.

RNA sequencing identifies and as predictive genes of aging CD264 human mesenchymal stem cells at an early passage.

Cytotechnology. 2025 Apr;77(2):63. doi: 10.1007/s10616-025-00724-8. Epub 2025 Feb 19.

Machine learning models based on fluid immunoproteins that predict non-AIDS adverse events in people with HIV.

iScience. 2024 May 8;27(6):109945. doi: 10.1016/j.isci.2024.109945. eCollection 2024 Jun 21.

The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case-control study.

BMC Infect Dis. 2024 Apr 18;24(1):411. doi: 10.1186/s12879-024-09298-w.

The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review.

J Med Internet Res. 2024 Jan 19;26:e47430. doi: 10.2196/47430.

Plasma protein biomarkers predict the development of persistent autoantibodies and type 1 diabetes 6 months prior to the onset of autoimmunity.

Cell Rep Med. 2023 Jul 18;4(7):101093. doi: 10.1016/j.xcrm.2023.101093. Epub 2023 Jun 29.

EPIMUTESTR: a nearest neighbor machine learning approach to predict cancer driver genes from the evolutionary action of coding variants.

Nucleic Acids Res. 2022 Jul 8;50(12):e70. doi: 10.1093/nar/gkac215.

Prediction of Type 1 Diabetes at Birth: Cord Blood Metabolites vs Genetic Risk Score in the Norwegian Mother, Father, and Child Cohort.

J Clin Endocrinol Metab. 2021 Sep 27;106(10):e4062-e4071. doi: 10.1210/clinem/dgab400.

Lipid Profiles and Heart Failure Risk: Results From Two Prospective Studies.

Circ Res. 2021 Feb 5;128(3):309-320. doi: 10.1161/CIRCRESAHA.120.317883. Epub 2020 Dec 4.

本文引用的文献

Bayesian analysis of pair-matched case-control studies subject to outcome misclassification.

Stat Med. 2017 Nov 20;36(26):4196-4213. doi: 10.1002/sim.7427. Epub 2017 Aug 7.

Defective methionine metabolism in the brain after repeated blast exposures might contribute to increased oxidative stress.

Neurochem Int. 2018 Jan;112:234-238. doi: 10.1016/j.neuint.2017.07.014. Epub 2017 Jul 31.

Fatty acid status in infancy is associated with the risk of type 1 diabetes-associated autoimmunity.

Diabetologia. 2017 Jul;60(7):1223-1233. doi: 10.1007/s00125-017-4280-9. Epub 2017 May 4.

ω-3 polyunsaturated fatty acids ameliorate type 1 diabetes and autoimmunity.

J Clin Invest. 2017 May 1;127(5):1757-1771. doi: 10.1172/JCI87388. Epub 2017 Apr 4.

Bayesian Variable Selection Methods for Matched Case-Control Studies.

Int J Biostat. 2017 Jan 31;13(1):/j/ijb.2017.13.issue-1/ijb-2016-0043/ijb-2016-0043.xml. doi: 10.1515/ijb-2016-0043.

α-Hydroxybutyric Acid Is a Selective Metabolite Biomarker of Impaired Glucose Tolerance.

Diabetes Care. 2016 Jun;39(6):988-95. doi: 10.2337/dc15-2752. Epub 2016 Apr 5.

Upregulation of lncRNA MEG3 promotes hepatic insulin resistance via increasing FoxO1 expression.

Biochem Biophys Res Commun. 2016 Jan 8;469(2):319-25. doi: 10.1016/j.bbrc.2015.11.048. Epub 2015 Nov 19.

Downregulation of Long Noncoding RNA Meg3 Affects Insulin Synthesis and Secretion in Mouse Pancreatic Beta Cells.

J Cell Physiol. 2016 Apr;231(4):852-62. doi: 10.1002/jcp.25175. Epub 2015 Sep 9.

Regularization Paths for Conditional Logistic Regression: The clogitL1 Package.

J Stat Softw. 2014 Jul;58(12).

Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm.

BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S1. doi: 10.1186/1471-2105-16-S6-S1. Epub 2015 Apr 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

将分类算法扩展到病例对照研究。

Extending Classification Algorithms to Case-Control Studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译