Robson Eyes S, Ioannidis Nilah M
Center for Computational Biology, UC Berkeley, Berkeley, CA 94720.
Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, CA 94720.
bioRxiv. 2024 Mar 7:2023.10.12.562113. doi: 10.1101/2023.10.12.562113.
Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks. Compared to existing task formulations in computational genomics, GUANinE is large-scale, de-noised, and suitable for evaluating pretrained models. GUANinE v1.0 primarily focuses on functional genomics tasks such as functional element annotation and gene expression prediction, and it also draws upon connections to evolutionary biology through sequence conservation tasks. The current GUANinE tasks provide insight into the performance of existing genomic AI models and non-neural baselines, with opportunities to be refined, revisited, and broadened as the field matures. Finally, the GUANinE benchmark allows us to evaluate new self-supervised T5 models and explore the tradeoffs between tokenization and model performance, while showcasing the potential for self-supervision to complement existing pretraining procedures.
计算基因组学越来越依赖机器学习方法来进行基因组解读,而最近采用的神经序列到功能模型凸显了对严格模型规范和可控评估的需求,这些问题在人工智能的其他领域也很常见。推动基因组人工智能领域发展并促进模型开发,还需要借鉴那些在其他领域取得巨大成功的研究策略,包括基准测试、审计和算法公平性。在此,我们提出了一个基因组人工智能基准GUANinE,用于评估模型在多个不同基因组任务上的泛化能力。与计算基因组学中现有的任务形式相比,GUANinE规模更大、经过去噪处理,适用于评估预训练模型。GUANinE v1.0主要关注功能基因组学任务,如功能元件注释和基因表达预测,并且还通过序列保守性任务与进化生物学建立联系。当前的GUANinE任务为了解现有基因组人工智能模型和非神经基线的性能提供了依据,随着该领域的成熟,还有机会对其进行完善、重新审视和拓展。最后,GUANinE基准使我们能够评估新的自监督T5模型,并探索词元化和模型性能之间的权衡,同时展示自监督对补充现有预训练程序的潜力。