变革性机器学习：从许多相关科学问题中学习的方法。

Transformational machine learning: Learning how to learn from many related scientific problems.

机构信息

School of Computer Science and Mathematics, Liverpool John Moores University, Liverpool L3 5UX, United Kingdom.

Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2021 Dec 7;118(49). doi: 10.1073/pnas.2108013118.

DOI:10.1073/pnas.2108013118

PMID:34845013

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8670494/

Abstract

Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystem-based approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our ∼50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (∼100 Gbytes).

摘要

几乎所有的机器学习（ML）都是基于使用内在特征来表示示例的。当存在多个相关的 ML 问题（任务）时，可以通过首先在其他任务上训练 ML 模型，并让它们对新任务的每个示例进行预测，从而将这些特征转换为外在特征，从而产生新的表示。我们称之为转换式机器学习（TML）。TML 与迁移学习、多任务学习和堆叠非常相似，并且具有协同作用。TML 适用于改进任何非线性 ML 方法。我们使用非线性 ML 的最重要的三个类别：随机森林、梯度提升机、支持向量机、k 近邻和神经网络，对 TML 进行了测试。为了确保评估的通用性和稳健性，我们利用了来自三个科学领域的数千个 ML 问题：药物设计、预测基因表达和 ML 算法选择。我们发现 TML 显著提高了所有 ML 方法在所有领域的预测性能（平均提高 4%到 50%），并且 TML 特征通常优于内在特征。通过可解释性 ML，TML 的使用还增强了科学理解。在药物设计中，我们发现 TML 提供了有关药物靶标特异性、药物之间的关系以及靶蛋白之间的关系的深入了解。TML 导致了基于生态系统的 ML 方法，其中新的任务、示例、预测等协同作用以提高性能。为了为这个生态系统做出贡献，我们所有的数据、代码和我们的约 50000 个 ML 模型都使用 Findability、Accessibility、Interoperability 和 Reusability 原则进行了充分的元数据注释、链接和公开发布（约 100 Gbytes）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3395/8670494/a83c4e979022/pnas.202108013fig01.jpg

相似文献

Transformational machine learning: Learning how to learn from many related scientific problems.变革性机器学习：从许多相关科学问题中学习的方法。

Proc Natl Acad Sci U S A. 2021 Dec 7;118(49). doi: 10.1073/pnas.2108013118.

Development and Multi-center validation of a machine learning Model for advanced colorectal neoplasms screening.一种用于晚期结直肠肿瘤筛查的机器学习模型的开发与多中心验证

Comput Biol Med. 2025 May;190:110066. doi: 10.1016/j.compbiomed.2025.110066. Epub 2025 Mar 28.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Heterogeneous Multitask Metric Learning Across Multiple Domains.跨多领域的异构多任务度量学习

IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4051-4064. doi: 10.1109/TNNLS.2017.2750321. Epub 2017 Oct 4.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型：基于多中心队列研究的开发与验证研究

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

A multitask learning model for online pattern recognition.一种用于在线模式识别的多任务学习模型。

IEEE Trans Neural Netw. 2009 Mar;20(3):430-45. doi: 10.1109/TNN.2008.2007961. Epub 2009 Feb 2.

A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example.一种用于具有不同化学空间任务的新型多任务学习算法：以斑马鱼毒性预测为例。

J Cheminform. 2024 Aug 2;16(1):91. doi: 10.1186/s13321-024-00891-4.

Predicting overall survival in glioblastoma patients using machine learning: an analysis of treatment efficacy and patient prognosis.使用机器学习预测胶质母细胞瘤患者的总生存期：治疗疗效和患者预后分析。

Front Oncol. 2025 Apr 9;15:1539845. doi: 10.3389/fonc.2025.1539845. eCollection 2025.

Machine Learning and Deep Learning for Diagnosis of Lumbar Spinal Stenosis: Systematic Review and Meta-Analysis.用于诊断腰椎管狭窄症的机器学习与深度学习：系统评价与荟萃分析

J Med Internet Res. 2024 Dec 23;26:e54676. doi: 10.2196/54676.

Use of Machine Learning to Identify Follow-Up Recommendations in Radiology Reports.利用机器学习识别放射学报告中的随访建议。

J Am Coll Radiol. 2019 Mar;16(3):336-343. doi: 10.1016/j.jacr.2018.10.020. Epub 2018 Dec 29.

引用本文的文献

Establishing predictive machine learning models for drug responses in patient derived cell culture.建立针对患者来源细胞培养中药物反应的预测性机器学习模型。

NPJ Precis Oncol. 2025 Jun 13;9(1):180. doi: 10.1038/s41698-025-00937-2.

DeepDrug as an expert guided and AI driven drug repurposing methodology for selecting the lead combination of drugs for Alzheimer's disease.DeepDrug是一种由专家指导和人工智能驱动的药物重新利用方法，用于选择治疗阿尔茨海默病的先导药物组合。

Sci Rep. 2025 Jan 15;15(1):2093. doi: 10.1038/s41598-025-85947-7.

Machine and deep learning approaches to understand and predict habitat suitability for seabird breeding.利用机器学习和深度学习方法理解和预测海鸟繁殖的栖息地适宜性。

Ecol Evol. 2023 Sep 17;13(9):e10549. doi: 10.1002/ece3.10549. eCollection 2023 Sep.

New use for an old drug: Metformin and atrial fibrillation.老药新用：二甲双胍与心房颤动。

Cell Rep Med. 2022 Dec 20;3(12):100875. doi: 10.1016/j.xcrm.2022.100875.

本文引用的文献

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction.基于定量构效关系的亲和力指纹图谱（第2部分）：效能预测的建模性能

J Cheminform. 2020 Jun 5;12(1):41. doi: 10.1186/s13321-020-00444-5.

Multi-task learning with a natural metric for quantitative structure activity relationship learning.用于定量构效关系学习的具有自然度量的多任务学习

J Cheminform. 2019 Nov 12;11(1):68. doi: 10.1186/s13321-019-0392-1.

Meta-QSAR: a large-scale application of meta-learning to drug design and discovery.元定量构效关系（Meta-QSAR）：元学习在药物设计与发现中的大规模应用。

Mach Learn. 2018;107(1):285-311. doi: 10.1007/s10994-017-5685-x. Epub 2017 Dec 22.

Automated De Novo Drug Design: Are We Nearly There Yet?自动化从头药物设计：我们快成功了吗？

Angew Chem Int Ed Engl. 2019 Aug 5;58(32):10792-10803. doi: 10.1002/anie.201814681. Epub 2019 May 17.

Therapeutic role of melatonin in migraine prophylaxis: A systematic review.褪黑素在偏头痛预防中的治疗作用：一项系统评价。

Medicine (Baltimore). 2019 Jan;98(3):e14099. doi: 10.1097/MD.0000000000014099.

Machine learning for molecular and materials science.机器学习在分子和材料科学中的应用。

Nature. 2018 Jul;559(7715):547-555. doi: 10.1038/s41586-018-0337-2. Epub 2018 Jul 25.

A novel methodology on distributed representations of proteins using their interacting ligands.一种利用蛋白质相互作用配体进行蛋白质分布表示的新方法。

Bioinformatics. 2018 Jul 1;34(13):i295-i303. doi: 10.1093/bioinformatics/bty287.

Planning chemical syntheses with deep neural networks and symbolic AI.用深度神经网络和符号人工智能规划化学合成。

Nature. 2018 Mar 28;555(7698):604-610. doi: 10.1038/nature25978.

Automating drug discovery.自动化药物发现。

Nat Rev Drug Discov. 2018 Feb;17(2):97-113. doi: 10.1038/nrd.2017.232. Epub 2017 Dec 15.

Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data.LINCS 计划的细胞信号网络综合数据库数据门户：对多样化大规模细胞扰动反应数据的综合访问。

Nucleic Acids Res. 2018 Jan 4;46(D1):D558-D566. doi: 10.1093/nar/gkx1063.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

变革性机器学习：从许多相关科学问题中学习的方法。

Transformational machine learning: Learning how to learn from many related scientific problems.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献