Park Sinwoo, Lee Jinbaek, Kim Jaeryeong, Kim Dohyeon, Lee Jin Hyup, Pack Seung Pil, Seo Minseok
Department of Computer and Information Science, Korea University, Sejong City, Republic of Korea.
Department of Computer Convergence Software, Korea University, Sejong City, Republic of Korea.
Front Vet Sci. 2023 Feb 21;10:1128570. doi: 10.3389/fvets.2023.1128570. eCollection 2023.
For reference genomes and gene annotations are key materials that can determine the limits of the molecular biology research of a species; however, systematic research on their quality assessment remains insufficient.
We collected reference assemblies, gene annotations, and 3,420 RNA-sequencing (RNA-seq) data from 114 species and selected effective indicators to simultaneously evaluate the reference genome quality of various species, including statistics that can be obtained empirically during the mapping process of short reads. Furthermore, we newly presented and applied transcript diversity and quantification success rates that can relatively evaluate the quality of gene annotations of various species. Finally, we proposed a next-generation sequencing (NGS) applicability index by integrating a total of 10 effective indicators that can evaluate the genome and gene annotation of a specific species.
Based on these effective evaluation indicators, we successfully evaluated and demonstrated the relative accessibility of NGS applications in all species, which will directly contribute to determining the technological boundaries in each species. Simultaneously, we expect that it will be a key indicator to examine the direction of future development through relative quality evaluation of genomes and gene annotations in each species, including countless organisms whose genomes and gene annotations will be constructed in the future.
参考基因组和基因注释是决定一个物种分子生物学研究范围的关键材料;然而,关于它们质量评估的系统研究仍然不足。
我们从114个物种中收集了参考组装、基因注释和3420个RNA测序(RNA-seq)数据,并选择了有效指标来同时评估不同物种的参考基因组质量,包括在短读长映射过程中可以凭经验获得的统计数据。此外,我们新提出并应用了转录本多样性和定量成功率,以相对评估不同物种基因注释的质量。最后,我们通过整合总共10个可评估特定物种基因组和基因注释的有效指标,提出了一个下一代测序(NGS)适用性指数。
基于这些有效的评估指标,我们成功地评估并展示了NGS应用在所有物种中的相对可及性,这将直接有助于确定每个物种中的技术边界。同时,我们预计,通过对每个物种的基因组和基因注释进行相对质量评估,包括未来将构建基因组和基因注释的无数生物体,这将成为检验未来发展方向的关键指标。