Sharma Virag, Hiller Michael
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.
Nucleic Acids Res. 2017 Aug 21;45(14):8369-8377. doi: 10.1093/nar/gkx554.
Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics.
基因组比对为将基因注释从注释完善的参考基因组转移到许多其他比对的基因组提供了有力基础。这些注释的完整性关键取决于基础基因组比对的灵敏度。在此,我们研究了基因组比对参数的影响,发现具有更高灵敏度的参数能够检测到数千个之前遗漏的直系同源外显子之间的新比对。特别是,对于进化距离大于每个中性位点0.75个替换的物种之间的比较,如人类与其他非胎盘脊椎动物,提高灵敏度会有所助益。为了系统地测试提高灵敏度是否能改善比较基因注释,我们构建了144个脊椎动物基因组的多重比对,并使用该比对通过CESAR将人类基因映射到其他143个脊椎动物。我们发现,更高的比对灵敏度通过分别为哺乳动物和非哺乳动物物种平均增加2382个和7440个新外显子以及117个和317个新基因,显著提高了比较基因注释的完整性。我们的结果表明了一种更灵敏的比对策略,该策略通常应用于远缘物种之间的基因组比对。我们的144个脊椎动物基因组比对以及比较基因注释(https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/)是比较基因组学的宝贵资源。