小鼠多组织基因组注释图谱的生成与分析。

Generation and analysis of a mouse multitissue genome annotation atlas.

作者信息

Adams Matthew, Vollmers Christopher

机构信息

Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA.

Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA

出版信息

Genome Res. 2024 Nov 20;34(11):2108-2117. doi: 10.1101/gr.279217.124.

DOI:10.1101/gr.279217.124

PMID:39443154

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11610592/

Abstract

Generating an accurate and complete genome annotation for an organism is complex because the cells within each tissue can express a unique set of transcript isoforms from a unique set of genes. A comprehensive genome annotation should contain information on what tissues express what transcript isoforms at what level. This tissue-level isoform information can then inform a wide range of research questions as well as experiment designs. Long-read sequencing technology combined with advanced full-length cDNA library preparation methods has now achieved throughput and accuracy where generating these types of annotations is achievable. Here, we show this by generating a genome annotation of the mouse (). We used the nanopore-based R2C2 long-read sequencing method to generate 64 million highly accurate full-length cDNA consensus reads-averaging 5.4 million reads per tissue for a dozen tissues. Using the Mandalorion tool, we processed these reads to generate the Tissue-level Atlas of Mouse Isoforms which is available as a trackhub for the UCSC Genome Browser and contains at least one full-length isoform for the vast majority of expressed genes in each tissue.

摘要

为一个生物体生成准确而完整的基因组注释是复杂的，因为每个组织中的细胞可以从一组独特的基因中表达一组独特的转录本异构体。一个全面的基因组注释应该包含关于哪些组织在何种水平上表达哪些转录本异构体的信息。这种组织水平的异构体信息随后可以为广泛的研究问题以及实验设计提供参考。长读长测序技术与先进的全长cDNA文库制备方法相结合，现已实现了通量和准确性，从而能够生成这类注释。在这里，我们通过生成小鼠的基因组注释来展示这一点。我们使用基于纳米孔的R2C2长读长测序方法生成了6400万个高度准确的全长cDNA一致读段——每个组织平均有540万个读段，共涉及十二个组织。使用Mandalorion工具，我们处理这些读段以生成小鼠异构体的组织水平图谱，该图谱可作为UCSC基因组浏览器的一个轨道中心使用，并且包含每个组织中绝大多数表达基因的至少一个全长异构体。