Meisel Richard P, Freeman Jamie C, Asgari Danial, Llaca Victor, Fengler Kevin A, Mann David, Rastogi Achal, Loso Mike, Geng Chaoxian, Scott Jeffrey G
Department of Biology and Biochemistry, Science and Research 2, University of Houston, Houston, Texas, USA.
Department of Entomology, Comstock Hall, Cornell University, Ithaca, New York, USA.
Arch Insect Biochem Physiol. 2023 Nov;114(3):e22049. doi: 10.1002/arch.22049. Epub 2023 Aug 22.
The house fly, Musca domestica, is a pest of livestock, transmits pathogens of human diseases, and is a model organism in multiple biological research areas. The first house fly genome assembly was published in 2014 and has been of tremendous use to the community of house fly biologists, but that genome is discontiguous and incomplete by contemporary standards. To improve the house fly reference genome, we sequenced, assembled, and annotated the house fly genome using improved techniques and technologies that were not available at the time of the original genome sequencing project. The new genome assembly is substantially more contiguous and complete than the previous genome. The new genome assembly has a scaffold N50 of 12.46 Mb, which is a 50-fold improvement over the previous assembly. In addition, the new genome assembly is within 1% of the estimated genome size based on flow cytometry, whereas the previous assembly was missing nearly one-third of the predicted genome sequence. The improved genome assembly has much more contiguous scaffolds containing large gene families. To provide an example of the benefit of the new genome, we used it to investigate tandemly arrayed immune gene families. The new contiguous assembly of these loci provides a clearer picture of the regulation of the expression of immune genes, and it leads to new insights into the selection pressures that shape their evolution.
家蝇(Musca domestica)是家畜的害虫,可传播人类疾病的病原体,并且是多个生物学研究领域的模式生物。首个家蝇基因组组装序列于2014年发布,对家蝇生物学家群体有巨大用途,但按照当代标准,该基因组是不连续且不完整的。为改进家蝇参考基因组,我们使用原始基因组测序项目时不可用的改进技术对家蝇基因组进行了测序、组装和注释。新的基因组组装序列比之前的基因组更加连续和完整。新的基因组组装序列的支架N50为12.46 Mb,比之前的组装序列提高了50倍。此外,基于流式细胞术,新的基因组组装序列与估计的基因组大小相差不到1%,而之前的组装序列缺少近三分之一的预测基因组序列。改进后的基因组组装序列有更多包含大型基因家族的连续支架。为举例说明新基因组的益处,我们用它来研究串联排列的免疫基因家族。这些基因座的新连续组装序列更清楚地展现了免疫基因表达的调控情况,并为塑造其进化的选择压力带来了新的见解。