• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CODI:通过上下文分布外集成增强基于机器学习的分子谱分析

CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration.

作者信息

Eissa Tarek, Huber Marinus, Obermayer-Pietsch Barbara, Linkohr Birgit, Peters Annette, Fleischmann Frank, Žigman Mihaela

机构信息

Chair of Experimental Physics - Laser Physics, Ludwig-Maximilians-Universität München, Bavaria 85748, Germany.

Laboratory for Attosecond Physics, Max Planck Institute of Quantum Optics, Bavaria 85748, Germany.

出版信息

PNAS Nexus. 2024 Oct 15;3(10):pgae449. doi: 10.1093/pnasnexus/pgae449. eCollection 2024 Oct.

DOI:10.1093/pnasnexus/pgae449
PMID:39440022
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11495219/
Abstract

Molecular analytics increasingly utilize machine learning (ML) for predictive modeling based on data acquired through molecular profiling technologies. However, developing robust models that accurately capture physiological phenotypes is challenged by the dynamics inherent to biological systems, variability stemming from analytical procedures, and the resource-intensive nature of obtaining sufficiently representative datasets. Here, we propose and evaluate a new method: Contextual Out-of-Distribution Integration (CODI). Based on experimental observations, CODI generates synthetic data that integrate unrepresented sources of variation encountered in real-world applications into a given molecular fingerprint dataset. By augmenting a dataset with out-of-distribution variance, CODI enables an ML model to better generalize to samples beyond the seed training data, reducing the need for extensive experimental data collection. Using three independent longitudinal clinical studies and a case-control study, we demonstrate CODI's application to several classification tasks involving vibrational spectroscopy of human blood. We showcase our approach's ability to enable personalized fingerprinting for multiyear longitudinal molecular monitoring and enhance the robustness of trained ML models for improved disease detection. Our comparative analyses reveal that incorporating CODI into the classification workflow consistently leads to increased robustness against data variability and improved predictive accuracy.

摘要

分子分析越来越多地利用机器学习(ML),基于通过分子谱分析技术获取的数据进行预测建模。然而,开发能够准确捕捉生理表型的稳健模型面临着生物系统固有的动态性、分析程序产生的变异性以及获取足够有代表性的数据集所需的资源密集性等挑战。在此,我们提出并评估一种新方法:上下文分布外整合(CODI)。基于实验观察,CODI生成合成数据,将实际应用中遇到的未被表征的变异来源整合到给定的分子指纹数据集中。通过用分布外方差扩充数据集,CODI使ML模型能够更好地推广到种子训练数据之外的样本,减少了对广泛实验数据收集的需求。利用三项独立的纵向临床研究和一项病例对照研究,我们展示了CODI在涉及人体血液振动光谱的多个分类任务中的应用。我们展示了我们的方法能够实现多年纵向分子监测的个性化指纹识别,并增强训练后的ML模型的稳健性以改善疾病检测。我们的比较分析表明,将CODI纳入分类工作流程始终会提高对数据变异性的稳健性并提高预测准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/c779927ba600/pgae449f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/7e9ad2d0b8fd/pgae449f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/9c8761af7d39/pgae449f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/ddd9e6a753c4/pgae449f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/4f8b53885af7/pgae449f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/c779927ba600/pgae449f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/7e9ad2d0b8fd/pgae449f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/9c8761af7d39/pgae449f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/ddd9e6a753c4/pgae449f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/4f8b53885af7/pgae449f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6886/11495219/c779927ba600/pgae449f5.jpg

相似文献

1
CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration.CODI:通过上下文分布外集成增强基于机器学习的分子谱分析
PNAS Nexus. 2024 Oct 15;3(10):pgae449. doi: 10.1093/pnasnexus/pgae449. eCollection 2024 Oct.
2
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
3
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
4
Does synthetic data augmentation improve the performances of machine learning classifiers for identifying health problems in patient-nurse verbal communications in home healthcare settings?在家庭医疗环境中,合成数据增强能否提高机器学习分类器在患者-护士言语交流中识别健康问题的性能?
J Nurs Scholarsh. 2025 Jan;57(1):47-58. doi: 10.1111/jnu.13004. Epub 2024 Jul 3.
5
Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.预测性大数据分析:一项使用大规模、复杂、异构、不一致、多源和不完整观测数据对帕金森病的研究。
PLoS One. 2016 Aug 5;11(8):e0157077. doi: 10.1371/journal.pone.0157077. eCollection 2016.
6
The Childhood Obesity Data Initiative: A Case Study in Implementing Clinical-Community Infrastructure Enhancements to Support Health Services Research and Public Health.儿童肥胖数据倡议:实施临床-社区基础设施增强以支持卫生服务研究和公共卫生的案例研究。
J Public Health Manag Pract. 2022;28(2):E430-E440. doi: 10.1097/PHH.0000000000001419.
7
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测:机器学习在 1 型糖尿病中的应用。
Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.
8
Using Machine Learning to Optimize the Quality of Survey Data: Protocol for a Use Case in India.利用机器学习优化调查数据质量:印度一个用例的方案
JMIR Res Protoc. 2020 Aug 5;9(8):e17619. doi: 10.2196/17619.
9
Performance of a Computational Model of the Mammalian Olfactory System哺乳动物嗅觉系统计算模型的性能
10
High-Resolution Digital Phenotypes From Consumer Wearables and Their Applications in Machine Learning of Cardiometabolic Risk Markers: Cohort Study.消费者可穿戴设备的高分辨率数字表型及其在心代谢风险标志物机器学习中的应用:队列研究。
J Med Internet Res. 2022 Jul 29;24(7):e34669. doi: 10.2196/34669.

引用本文的文献

1
Bridging Spectral Gaps: Cross-Device Model Generalization in Blood-Based Infrared Spectroscopy.弥合光谱间隙:基于血液的红外光谱中的跨设备模型泛化
Anal Chem. 2025 May 20;97(19):10264-10272. doi: 10.1021/acs.analchem.5c00185. Epub 2025 May 7.
2
The Perils of Molecular Interpretations from Vibrational Spectra of Complex Samples.复杂样品振动光谱分子解读的风险
Angew Chem Int Ed Engl. 2024 Dec 9;63(50):e202411596. doi: 10.1002/anie.202411596. Epub 2024 Nov 7.

本文引用的文献

1
Plasma infrared fingerprinting with machine learning enables single-measurement multi-phenotype health screening.结合机器学习的血浆红外指纹图谱技术可实现单次测量的多表型健康筛查。
Cell Rep Med. 2024 Jul 16;5(7):101625. doi: 10.1016/j.xcrm.2024.101625. Epub 2024 Jun 28.
2
Generalization-a key challenge for responsible AI in patient-facing clinical applications.泛化——面向患者的临床应用中负责任人工智能的关键挑战。
NPJ Digit Med. 2024 May 21;7(1):126. doi: 10.1038/s41746-024-01127-3.
3
Multi-layered maps of neuropil with segmentation-guided contrastive learning.
基于分割引导对比学习的神经突多层图谱。
Nat Methods. 2023 Dec;20(12):2011-2020. doi: 10.1038/s41592-023-02059-8. Epub 2023 Nov 20.
4
Extrinsic and intrinsic preanalytical variables affecting liquid biopsy in cancer.影响癌症液体活检的外在和内在分析前变量。
Cell Rep Med. 2023 Oct 17;4(10):101196. doi: 10.1016/j.xcrm.2023.101196. Epub 2023 Sep 18.
5
Biomarkers of aging for the identification and evaluation of longevity interventions.衰老生物标志物用于鉴定和评估长寿干预措施。
Cell. 2023 Aug 31;186(18):3758-3775. doi: 10.1016/j.cell.2023.08.003.
6
Machine learning of spectra-property relationship for imperfect and small chemistry data.光谱-性质关系的机器学习研究:针对不完整和小化学数据集
Proc Natl Acad Sci U S A. 2023 May 16;120(20):e2220789120. doi: 10.1073/pnas.2220789120. Epub 2023 May 8.
7
Limits and Prospects of Molecular Fingerprinting for Phenotyping Biological Systems Revealed through Modeling.通过建模揭示分子指纹分析在表型生物系统中的局限性和前景。
Anal Chem. 2023 Apr 25;95(16):6523-6532. doi: 10.1021/acs.analchem.2c04711. Epub 2023 Apr 12.
8
Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data.域适应主成分分析:用于处理分布外数据学习的基础线性方法
Entropy (Basel). 2022 Dec 24;25(1):33. doi: 10.3390/e25010033.
9
Removing unwanted variation from large-scale RNA sequencing data with PRPS.使用 PRPS 去除大规模 RNA 测序数据中的非期望变异。
Nat Biotechnol. 2023 Jan;41(1):82-95. doi: 10.1038/s41587-022-01440-w. Epub 2022 Sep 15.
10
Combining Pharmacokinetics and Vibrational Spectroscopy: MCR-ALS Hard-and-Soft Modelling of Drug Uptake In Vitro Using Tailored Kinetic Constraints.结合药代动力学和振动光谱学:使用定制的动力学约束,通过 MCR-ALS 硬软建模进行体外药物摄取研究。
Cells. 2022 May 5;11(9):1555. doi: 10.3390/cells11091555.