Suppr超能文献

利用表达序列标签评估选定植物物种的基因组测序质量。

Evaluation of genome sequencing quality in selected plant species using expressed sequence tags.

机构信息

College of Horticulture, Nanjing Agricultural University, Nanjing City, Jiangsu Province, China.

出版信息

PLoS One. 2013 Jul 29;8(7):e69890. doi: 10.1371/journal.pone.0069890. Print 2013.

Abstract

BACKGROUND

With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated.

METHODOLOGY/PRINCIPAL FINDING: Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly.

CONCLUSION

The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published.

摘要

背景

随着 30 多种植物的基因组测序项目的完成,大量的基因组序列已经产生并存储在在线数据库中。测序技术的进步降低了全基因组测序的成本和时间,使越来越多的植物能够进行基因组测序。尽管如此,仍未对多种植物的基因组序列质量进行评估。

方法/主要发现:完整性和准确性被用来评估 32 种植物的基因组序列质量。基因组序列的完整性由染色体大小与基因组大小的比例(或支架大小与基因组大小的比例)表示,范围从 55.31%到接近 100%。基因组序列的准确性由匹配 EST 和选定 EST 的比例表示,其中 52.93%98.28%和 89.02%98.85%的随机选择的清洁 EST 可以分别映射到染色体和支架序列。根据每个植物物种的完整性、准确性和其他分析,将 13 个植物物种分为四个等级。拟南芥、水稻和玉米的质量最高,其次是短柄草、毛白杨、葡萄和大豆、高粱、番茄和草莓,以及豌豆、紫花苜蓿和苹果。将支架序列组装成染色体序列应该是其余 19 个物种的首要任务。低 GC 含量和重复 DNA 会影响基因组序列的组装。

结论

发现植物基因组序列的质量低于预期,因此,基因组测序项目的快速发展以及生物信息学工具和基因组序列组装算法的研究应该为已经发表的基因组序列提供更多的处理和纠正。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验