克氏锥虫重复基因数据库：另外20000个基因变体

Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants.

作者信息

Arner Erik, Kindlund Ellen, Nilsson Daniel, Farzana Fatima, Ferella Marcela, Tammi Martti T, Andersson Björn

机构信息

Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden.

出版信息

BMC Genomics. 2007 Oct 26;8:391. doi: 10.1186/1471-2164-8-391.

DOI:10.1186/1471-2164-8-391

PMID:17963481

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2204015/

Abstract

BACKGROUND

Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats. These repeats include surface molecule genes, and several other gene families. In the T. cruzi genome sequencing project, it was clear that not all copies of repetitive genes were present in the assembly, due to collapse of nearly identical repeats. However, at the time of publication of the T. cruzi genome, it was not clear to what extent this had occurred.

RESULTS

We have developed a pipeline to estimate the genomic repeat content, where shotgun reads are aligned to the genomic sequence and the gene copy number is estimated using the average shotgun coverage. This method was applied to the genome of T. cruzi and copy numbers of all protein coding sequences and pseudogenes were estimated. The 22,640 results were stored in a database available online. 18% of all protein coding sequences and pseudogenes were estimated to exist in 14 or more copies in the T. cruzi CL Brener genome. The average coverage of the annotated protein coding sequences and pseudogenes indicate a total gene copy number, including allelic gene variants, of over 40,000.

CONCLUSION

Our results indicate that the number of protein coding sequences and pseudogenes in the T. cruzi genome may be twice the previous estimate. We have constructed a database of the T. cruzi gene repeat data that is available as a resource to the community. The main purpose of the database is to enable biologists interested in repeated, unfinished regions to closely examine and resolve these regions themselves using all available shotgun data, instead of having to rely on annotated consensus sequences that often are erroneous and possibly misleading. Five repetitive genes were studied in more detail, in order to illustrate how the database can be used to analyze and extract information about gene repeats with different characteristics in Trypanosoma cruzi.

摘要

背景

重复序列存在于所有基因组中，且通常具有重要功能。然而，在大型基因组测序项目中，许多重复区域仍未得到表征。原生动物寄生虫克氏锥虫的基因组中重复序列占比超过50%。这些重复序列包括表面分子基因以及其他几个基因家族。在克氏锥虫基因组测序项目中，由于几乎相同的重复序列发生塌陷，很明显并非所有重复基因的拷贝都存在于组装结果中。然而，在克氏锥虫基因组发表时，尚不清楚这种情况发生的程度。

结果

我们开发了一种用于估计基因组重复序列含量的流程，即将鸟枪法测序读段与基因组序列进行比对，并使用平均鸟枪法覆盖度来估计基因拷贝数。该方法应用于克氏锥虫基因组，估计了所有蛋白质编码序列和假基因的拷贝数。22640个结果存储在一个在线数据库中。据估计，在克氏锥虫CL Brener基因组中，18%的蛋白质编码序列和假基因以14个或更多拷贝存在。注释的蛋白质编码序列和假基因的平均覆盖度表明，包括等位基因变体在内的总基因拷贝数超过40000个。

结论

我们的结果表明，克氏锥虫基因组中蛋白质编码序列和假基因的数量可能是先前估计值的两倍。我们构建了一个克氏锥虫基因重复数据数据库，作为一种资源供学界使用。该数据库的主要目的是使对重复、未完成区域感兴趣的生物学家能够利用所有可用的鸟枪法数据自行仔细研究和解析这些区域，而不必依赖通常有误且可能具有误导性的注释一致序列。为了说明如何利用该数据库分析和提取克氏锥虫中具有不同特征的基因重复信息，我们对五个重复基因进行了更详细的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4399/2204015/c306438a7b36/1471-2164-8-391-1.jpg

相似文献

Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants.

BMC Genomics. 2007 Oct 26;8:391. doi: 10.1186/1471-2164-8-391.

A random sequencing approach for the analysis of the Trypanosoma cruzi genome: general structure, large gene and repetitive DNA families, and gene discovery.

Genome Res. 2000 Dec;10(12):1996-2005. doi: 10.1101/gr.gr-1463r.

The Trypanosoma cruzi genome; conserved core genes and extremely variable surface molecule families.

Res Microbiol. 2011 Jul-Aug;162(6):619-25. doi: 10.1016/j.resmic.2011.05.003. Epub 2011 May 18.

Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi.

Microb Genom. 2018 May;4(5). doi: 10.1099/mgen.0.000177. Epub 2018 Apr 30.

Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families.

mBio. 2022 Dec 20;13(6):e0231922. doi: 10.1128/mbio.02319-22. Epub 2022 Oct 20.

Complete sequence of a 93.4-kb contig from chromosome 3 of Trypanosoma cruzi containing a strand-switch region.

Genome Res. 1998 Aug;8(8):809-16. doi: 10.1101/gr.8.8.809.

A population study of the minicircles in Trypanosoma cruzi: predicting guide RNAs in the absence of empirical RNA editing.

BMC Genomics. 2007 May 24;8:133. doi: 10.1186/1471-2164-8-133.

Genomic organization and transcription analysis of the 195-bp satellite DNA in Trypanosoma cruzi.

Mol Biochem Parasitol. 2008 Jul;160(1):60-4. doi: 10.1016/j.molbiopara.2008.03.004. Epub 2008 Mar 20.

Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231.

Microb Genom. 2018 Apr;4(4). doi: 10.1099/mgen.0.000156. Epub 2018 Feb 14.

Recombination-driven generation of the largest pathogen repository of antigen variants in the protozoan Trypanosoma cruzi.

BMC Genomics. 2016 Sep 13;17(1):729. doi: 10.1186/s12864-016-3037-z.

引用本文的文献

Protein subcellular relocalization and function of duplicated flagellar calcium binding protein genes in honey bee trypanosomatid parasite.

PLoS Genet. 2024 Mar 4;20(3):e1011195. doi: 10.1371/journal.pgen.1011195. eCollection 2024 Mar.

Genomic surveillance: a potential shortcut for effective Chagas disease management.

Mem Inst Oswaldo Cruz. 2023 Jan 20;117:e220164. doi: 10.1590/0074-02760220164. eCollection 2023.

Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads: The Case of Trypanosoma cruzi Multigene Families.

mBio. 2022 Dec 20;13(6):e0231922. doi: 10.1128/mbio.02319-22. Epub 2022 Oct 20.

Identification of vaccine targets in pathogens and design of a vaccine using computational approaches.

Sci Rep. 2021 Sep 2;11(1):17626. doi: 10.1038/s41598-021-96863-x.

Repeat-Driven Generation of Antigenic Diversity in a Major Human Pathogen, .

Front Cell Infect Microbiol. 2021 Mar 3;11:614665. doi: 10.3389/fcimb.2021.614665. eCollection 2021.

Genome: Organization, Multi-Gene Families, Transcription, and Biological Implications.

Genes (Basel). 2020 Oct 14;11(10):1196. doi: 10.3390/genes11101196.

Genomic assemblies of newly sequenced Trypanosoma cruzi strains reveal new genomic expansion and greater complexity.

Sci Rep. 2018 Oct 2;8(1):14631. doi: 10.1038/s41598-018-32877-2.

Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi.

Microb Genom. 2018 May;4(5). doi: 10.1099/mgen.0.000177. Epub 2018 Apr 30.

Transcriptomic analysis reveals metabolic switches and surface remodeling as key processes for stage transition in .

PeerJ. 2017 Mar 8;5:e3017. doi: 10.7717/peerj.3017. eCollection 2017.

Recombination-driven generation of the largest pathogen repository of antigen variants in the protozoan Trypanosoma cruzi.

BMC Genomics. 2016 Sep 13;17(1):729. doi: 10.1186/s12864-016-3037-z.

本文引用的文献

GRAT--genome-scale rapid alignment tool.

Comput Methods Programs Biomed. 2007 Apr;86(1):87-92. doi: 10.1016/j.cmpb.2007.01.002. Epub 2007 Feb 9.

Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel.

Mol Cancer Ther. 2006 Apr;5(4):853-67. doi: 10.1158/1535-7163.MCT-05-0155.

DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions.

BMC Bioinformatics. 2006 Mar 20;7:155. doi: 10.1186/1471-2105-7-155.

Trypanosoma cruzi surface mucins: host-dependent coat diversity.

Nat Rev Microbiol. 2006 Mar;4(3):229-36. doi: 10.1038/nrmicro1351.

Trypanosoma cruzi 5S rRNA arrays define five groups and indicate the geographic origins of an ancestor of the heterozygous hybrids.

Int J Parasitol. 2006 Mar;36(3):337-46. doi: 10.1016/j.ijpara.2005.11.002. Epub 2005 Dec 5.

Revealing the genomic heterogeneity of melanoma.

Cancer Cell. 2005 Dec;8(6):439-41. doi: 10.1016/j.ccr.2005.11.008.

Beware of mis-assembled genomes.

Bioinformatics. 2005 Dec 15;21(24):4320-1. doi: 10.1093/bioinformatics/bti769.

Use of full-length recombinant calflagin and its c fragment for improvement of diagnosis of Trypanosoma cruzi infection.

J Clin Microbiol. 2005 Nov;43(11):5498-503. doi: 10.1128/JCM.43.11.5498-5503.2005.

Noise in gene expression: origins, consequences, and control.

Science. 2005 Sep 23;309(5743):2010-3. doi: 10.1126/science.1105891.

Gene factories, microfunctionalization and the evolution of gene families.

Trends Genet. 2005 Nov;21(11):591-5. doi: 10.1016/j.tig.2005.08.008. Epub 2005 Sep 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

克氏锥虫重复基因数据库：另外20000个基因变体

Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants.

作者信息

Arner Erik, Kindlund Ellen, Nilsson Daniel, Farzana Fatima, Ferella Marcela, Tammi Martti T, Andersson Björn

机构信息

Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden.