Delcher Haley A, Alsatari Enas S, Haastrup Adeyeye I, Naaz Sayema, Hayes-Guastella Lydia A, McDaniel Autumn M, Clark Olivia G, Katerski Devin M, Prinsloo Francois O, Roberts Olivia R, Shaddix Meredith A, Sullivan Bridgette N, Swan Isabella M, Hartsell Emily M, DeMeis Jeffrey D, Paudel Sunita S, Borchert Glen M
Department of Pharmacology, University of South Alabama, Mobile, Alabama, USA.
Stokes School of Marine and Environmental Sciences, University of South Alabama, Mobile, Alabama, USA.
Biochem Mol Biol Educ. 2025 Jul-Aug;53(4):433-444. doi: 10.1002/bmb.21899. Epub 2025 May 5.
Today, due to the size of many genomes and the increasingly large sizes of sequencing files, independently analyzing sequencing data is largely impossible for a biologist with little to no programming expertise. As such, biologists are typically faced with the dilemma of either having to spend a significant amount of time and effort to learn how to program themselves or having to identify (and rely on) an available computer scientist to analyze large sequence data sets. That said, the advent of AI-powered programs like ChatGPT may offer a means of circumventing the disconnect between biologists and their analysis of genomic data critically important to their field. The work detailed herein demonstrates how implementing ChatGPT into an existing Course-based Undergraduate Research Experience curriculum can provide a means for equipping biology students with no programming expertise the power to generate their own programs and allow those students to carry out a publishable, comprehensive analysis of real-world Next Generation Sequencing (NGS) datasets. Relying solely on the students' biology background as a prompt for directing ChatGPT to generate Python codes, we found students could readily generate programs able to deal with and analyze NGS datasets greater than 10 gigabytes. In summary, we believe that integrating ChatGPT into education can help bridge a critical gap between biology and computer science and may prove similarly beneficial in other disciplines. Additionally, ChatGPT can provide biological researchers with powerful new tools capable of mediating NGS dataset analysis to help accelerate major new advances in the field.
如今,由于许多基因组的规模以及测序文件的尺寸越来越大,对于几乎没有编程专业知识的生物学家来说,独立分析测序数据基本上是不可能的。因此,生物学家通常面临两难境地:要么不得不花费大量时间和精力自学编程,要么不得不寻找(并依赖)一位现有的计算机科学家来分析大型序列数据集。话虽如此,像ChatGPT这样的人工智能驱动程序的出现,可能提供了一种方法来规避生物学家与其对本领域至关重要的基因组数据分析之间的脱节问题。本文详细介绍的工作展示了将ChatGPT应用于现有的基于课程的本科研究经验课程中,如何为没有编程专业知识的生物学学生提供一种能力,使其能够生成自己的程序,并让这些学生对真实世界的下一代测序(NGS)数据集进行可发表的全面分析。仅以学生的生物学背景作为引导ChatGPT生成Python代码的提示,我们发现学生能够轻松生成能够处理和分析超过10GB的NGS数据集的程序。总之,我们认为将ChatGPT融入教育可以帮助弥合生物学和计算机科学之间的关键差距,并且在其他学科中可能也会有类似的益处。此外,ChatGPT可以为生物学研究人员提供强大的新工具,能够介导NGS数据集分析,以帮助加速该领域的重大新进展。