Department of Biomedical Engineering, Duke University, Durham, North Carolina, United States of America.
Center for Quantitative Biodesign, Duke University, Durham, North Carolina, United States of America.
PLoS Comput Biol. 2024 Jun 3;20(6):e1012185. doi: 10.1371/journal.pcbi.1012185. eCollection 2024 Jun.
Multi-factor screenings are commonly used in diverse applications in medicine and bioengineering, including optimizing combination drug treatments and microbiome engineering. Despite the advances in high-throughput technologies, large-scale experiments typically remain prohibitively expensive. Here we introduce a machine learning platform, structure-augmented regression (SAR), that exploits the intrinsic structure of each biological system to learn a high-accuracy model with minimal data requirement. Under different environmental perturbations, each biological system exhibits a unique, structured phenotypic response. This structure can be learned based on limited data and once learned, can constrain subsequent quantitative predictions. We demonstrate that SAR requires significantly fewer data comparing to other existing machine-learning methods to achieve a high prediction accuracy, first on simulated data, then on experimental data of various systems and input dimensions. We then show how a learned structure can guide effective design of new experiments. Our approach has implications for predictive control of biological systems and an integration of machine learning prediction and experimental design.
多因素筛选在医学和生物工程的多个应用中都得到了广泛应用,包括优化组合药物治疗和微生物组工程。尽管高通量技术取得了进展,但大规模实验通常仍然过于昂贵。在这里,我们介绍了一种机器学习平台,结构增强回归(SAR),它利用每个生物系统的内在结构,用最小的数据需求学习高精度的模型。在不同的环境扰动下,每个生物系统都表现出独特的、结构化的表型反应。这种结构可以基于有限的数据进行学习,一旦学习,就可以限制后续的定量预测。我们证明 SAR 与其他现有的机器学习方法相比,在实现高精度预测时需要的数据要少得多,首先是在模拟数据上,然后是在各种系统和输入维度的实验数据上。然后,我们展示了如何从学习到的结构中指导有效的新实验设计。我们的方法对生物系统的预测控制和机器学习预测与实验设计的整合都具有重要意义。