Louha Swarnali, Ray David A, Winker Kevin, Glenn Travis C
Institute of Bioinformatics, University of Georgia, Athens, GA
Department of Biological Science, Texas Tech University, Lubbock, TX.
G3 (Bethesda). 2020 Apr 9;10(4):1159-1166. doi: 10.1534/g3.119.400929.
The song sparrow, , is one of the most widely distributed species of songbirds found in North America. It has been used in a wide range of behavioral and ecological studies. This species' pronounced morphological and behavioral diversity across populations makes it a favorable candidate in several areas of biomedical research. We have generated a high-quality genome assembly of using Illumina short read sequences from genomic and proximity-ligation libraries. The assembled genome is 978.3 Mb, with a physical coverage of 24.9×, N50 scaffold size of 5.6 Mb and N50 contig size of 31.7 Kb. Our genome assembly is highly complete, with 87.5% full-length genes present out of a set of 4,915 universal single-copy orthologs present in most avian genomes. We annotated our genome assembly and constructed 15,086 gene models, a majority of which have high homology to related birds, and In total, 83% of the annotated genes are assigned with putative functions. Furthermore, only ∼7% of the genome is found to be repetitive; these regions and other non-coding functional regions are also identified. The high-quality genome assembly and annotations we report will serve as a valuable resource for facilitating studies on genome structure and evolution that can contribute to biomedical research and serve as a reference in population genomic and comparative genomic studies of closely related species.
歌带鹀是在北美发现的分布最广泛的鸣禽物种之一。它已被用于广泛的行为和生态研究。该物种在不同种群中显著的形态和行为多样性使其成为生物医学研究多个领域的理想候选对象。我们利用来自基因组和邻近连接文库的Illumina短读序列,生成了高质量的歌带鹀基因组组装序列。组装后的基因组大小为978.3 Mb,物理覆盖度为24.9×,N50支架大小为5.6 Mb,N50重叠群大小为31.7 Kb。我们的基因组组装非常完整,在大多数鸟类基因组中存在的4915个通用单拷贝直系同源基因中,有87.5%的全长基因存在。我们对基因组组装序列进行了注释,并构建了15086个基因模型,其中大多数与相关鸟类具有高度同源性。总共83%的注释基因被赋予了推定功能。此外,仅发现约7%的基因组是重复的;这些区域和其他非编码功能区域也被识别出来。我们报告的高质量歌带鹀基因组组装序列和注释将作为一种宝贵资源,有助于促进基因组结构和进化研究,为生物医学研究做出贡献,并作为密切相关物种群体基因组和比较基因组研究的参考。