Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA.
Cell Syst. 2023 Jun 21;14(6):525-542.e9. doi: 10.1016/j.cels.2023.05.007.
The design choices underlying machine-learning (ML) models present important barriers to entry for many biologists who aim to incorporate ML in their research. Automated machine-learning (AutoML) algorithms can address many challenges that come with applying ML to the life sciences. However, these algorithms are rarely used in systems and synthetic biology studies because they typically do not explicitly handle biological sequences (e.g., nucleotide, amino acid, or glycan sequences) and cannot be easily compared with other AutoML algorithms. Here, we present BioAutoMATED, an AutoML platform for biological sequence analysis that integrates multiple AutoML methods into a unified framework. Users are automatically provided with relevant techniques for analyzing, interpreting, and designing biological sequences. BioAutoMATED predicts gene regulation, peptide-drug interactions, and glycan annotation, and designs optimized synthetic biology components, revealing salient sequence characteristics. By automating sequence modeling, BioAutoMATED allows life scientists to incorporate ML more readily into their work.
机器学习(ML)模型的设计选择对许多旨在将 ML 纳入其研究的生物学家来说是一个重要的进入障碍。自动化机器学习(AutoML)算法可以解决将 ML 应用于生命科学所面临的许多挑战。然而,这些算法在系统和合成生物学研究中很少使用,因为它们通常不能明确处理生物序列(例如,核苷酸、氨基酸或聚糖序列),并且不能与其他 AutoML 算法轻易进行比较。在这里,我们提出了 BioAutoMATED,这是一个用于生物序列分析的 AutoML 平台,它将多种 AutoML 方法集成到一个统一的框架中。用户可以自动获得用于分析、解释和设计生物序列的相关技术。BioAutoMATED 可以预测基因调控、肽-药物相互作用和聚糖注释,并设计优化的合成生物学组件,揭示出显著的序列特征。通过自动进行序列建模,BioAutoMATED 使生命科学家更容易将 ML 融入到他们的工作中。