Xiao Wenming, Wu Leihong, Yavas Gokhan, Simonyan Vahan, Ning Baitang, Hong Huixiao
National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD 20993, USA.
Pharmaceutics. 2016 Apr 22;8(2):15. doi: 10.3390/pharmaceutics8020015.
Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging "third generation sequencing" technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.
尽管我们每个人基因组中的DNA序列有超过99%是相同的,但在小区域内仍有数百万个序列编码或结构存在个体差异,这赋予了我们不同的外貌特征或对医学治疗的反应。目前,通过探索参考基因组与患病组织中检测到的序列之间的差异,可发现患病组织(如肿瘤)中的基因变异。然而,公共参考基因组是由多个个体的DNA推导而来的。因此,参考基因组并不完整,可能会错误呈现普通人群的序列变异。更可靠的解决方案是将患病组织的序列与其正常状态下组织的自身基因组序列进行比较。随着人类基因组测序成本大幅降至约1000美元,为每个人记录个人基因组展现出了光明的前景。然而,以可承受的成本进行个体基因组的从头组装仍然具有挑战性。因此,到目前为止,只有少数人类基因组被完全组装。在这篇综述中,我们介绍了人类基因组测序的历史以及测序平台的演变,从桑格测序到新兴的“第三代测序”技术。我们展示了目前可用于人类基因组组装的从头组装和组装后软件包,以及它们对计算基础设施的要求。我们建议,结合长读长和短读长的混合组装是生成高质量人类基因组组装的一种有前景的方法,并指定了组装结果质量评估的参数。我们提供了使用个人基因组作为参考的好处的观点以及获得高质量个人基因组的建议。最后,我们讨论了个人基因组在辅助疫苗设计与开发、监测宿主免疫反应、定制药物治疗和检测肿瘤方面的应用。我们相信精准医学将在很大程度上受益于生物信息学解决方案,特别是在个人基因组组装方面。