Du Qixiu, Wang Haochen, Jiang Benben, Wang Xiaowo
Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Beijing National Research Center for Information Science and Technology, Tsinghua University, No. 1 Qinghuayuan Street, Haidian District, Beijing 100084, China.
Department of Automation, Tsinghua University, No. 1 Qinghuayuan Street, Haidian District, Beijing 100084, China.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf286.
Employing machine learning (ML) models to accelerate experimentation and uncover biological mechanisms has been a rising tendency in genetic engineering. However, effectively collecting data to enhance model accuracy and improve design remains challenging, especially when data quality is poor and validation resources are limited. Active learning (AL) addresses this by iteratively identifying promising candidates, thereby reducing experimental efforts while improving model performance. This review highlights how AL can assist scientists throughout the design-build-test-learn cycle, explore its various practical implementations, and discuss its potential through the integration of cross-domain expertise. In the age of genetic engineering revolutionized by data-driven ML models, AL presents an iterative framework that significantly enhances the functionalities of biomolecules and uncovers their intrinsic mechanisms, all while minimizing expenses and efforts.
利用机器学习(ML)模型来加速实验并揭示生物学机制,在基因工程领域已呈上升趋势。然而,有效收集数据以提高模型准确性和改进设计仍然具有挑战性,特别是在数据质量较差且验证资源有限的情况下。主动学习(AL)通过迭代识别有前景的候选对象来解决这一问题,从而在提高模型性能的同时减少实验工作量。本综述重点介绍了主动学习如何在整个设计-构建-测试-学习循环中协助科学家,探讨其各种实际应用,并通过整合跨领域专业知识来讨论其潜力。在由数据驱动的ML模型引发革命的基因工程时代,主动学习提供了一个迭代框架,该框架显著增强了生物分子的功能并揭示了其内在机制,同时将费用和工作量降至最低。