首批哈萨克人全基因组：NGS数据的首次报告。

The First Kazakh Whole Genomes: The First Report of NGS Data.

作者信息

Akilzhanova Ainur, Kairov Ulykbek, Rakhimova Saule, Molkenov Askhat, Rhie Arang, Kim Jong-Il, Seo Jeong-Sun, Zhumadilov Zhaxybay

机构信息

Center for Life Sciences, Nazarbayev University, Astana, Kazakhstan.

Department of Biochemistry and Molecular Biology, Genomic Medicine Institute, Seoul National University College of Medicine, South Korea.

出版信息

Cent Asian J Glob Health. 2014 Dec 12;3(Suppl):146. doi: 10.5195/cajgh.2014.146. eCollection 2014.

DOI:10.5195/cajgh.2014.146

PMID:29805883

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5960922/

Abstract

INTRODUCTION

The human genome sequence will underpin human biology and medicine in the next century, providing a single, essential reference to all genetic information. Extraordinary technological advances and decreases in the cost of DNA sequencing have made the possibility of whole genome sequencing (WGS) feasible as a highly accessible test for numerous indications. The international project "Genetic architecture of Kazakh population" is well underway to determine the complete DNA. Next generation sequencing is a powerful tool for genetic analysis, which will enable us to uncover the association of loci at specific sites in the genome associated with disease. The aim of this study was to introduce first data on WGS of 6 Kazakh individuals.

METHODS

This pilot study is among the first WGS performed on 6 healthy Kazakh individuals, using next generation sequencing platform HiSeq2000, Illumina by manufacturer's protocols. All generated *.bcl files were simultaneously converted and demultiplexed using bcl2fasta application. Alignment of sequence reads performed using bwa-mem against human b19 reference genome. Sorting, removing of intermediate files, *.bam files assembling, and marking duplicates were performed using PicardTools package. GATK haplotype caller tool was used for variant calling. ClinVar, SNPedia, and Cosmic databases were processed to identify clinical genomic variants in 6 Kazakh whole genomes. Java Runtime Environment and R. Bioconductor packages were installed to perform raw data processing and run program scripts.

RESULTS

The sequence alignment and mapping procedures on reference genome hg19 of each 6 healthy Kazakh individual were completed. Between 87,308,581,400 and 107,526,741,301 total base pairs were sequenced with average coverage x29.85. Between 98.85% and 99.58% base pairs were totally mapped and on average 96.07% were properly paired. Het/Hom and Ti/Tv ratios for each whole genome ranged from 1.35 to 1.52 and from 2.07 to 2.08, respectively. We compared and analyzed each genome with on existing clinical databases ClinVar, SNPedia, Cosmic and found from 20 to 25, from 269 to 288, from 7 to 12 SNP records, respectively. The availability of a reference Kazakh genome sequences provides the basis for studying the nature of sequence variation, particularly single nucleotide polymorphisms.

CONCLUSION

The first whole genome sequencing of Kazakhs were performed. In this pilot study, we identified SNPs associated with different conditions. Further studies of WGS on Kazakh population are needed to identify possible unique genetic variants in Kazakhs.

摘要

引言

人类基因组序列将成为下个世纪人类生物学和医学的基础，为所有遗传信息提供唯一重要的参考。DNA测序技术的非凡进步以及成本的降低，使得全基因组测序（WGS）作为一种适用于多种适应症的高度可及检测方法成为可能。“哈萨克族人群遗传结构”国际项目正在顺利进行，以确定完整的DNA。新一代测序是遗传分析的有力工具，它将使我们能够揭示基因组中与疾病相关的特定位点的基因座关联。本研究的目的是介绍6名哈萨克族人全基因组测序的首批数据。

方法

这项试点研究是对6名健康哈萨克族人进行的首批全基因组测序之一，使用制造商协议的Illumina HiSeq2000新一代测序平台。所有生成的*.bcl文件使用bcl2fasta应用程序同时进行转换和解复用。使用bwa-mem将序列读数与人b19参考基因组进行比对。使用PicardTools软件包进行排序、去除中间文件、*.bam文件组装和标记重复项。使用GATK单倍型分型工具进行变异检测。对ClinVar、SNPedia和Cosmic数据库进行处理，以识别6名哈萨克族人全基因组中的临床基因组变异。安装Java运行时环境和R. Bioconductor软件包以进行原始数据处理并运行程序脚本。

结果

完成了6名健康哈萨克族人中每个人的参考基因组hg19的序列比对和映射程序。共测序87,308,581,400至107,526,741,301个碱基对，平均覆盖度为x29.85。98.85%至99.58%的碱基对被完全映射，平均96.07%正确配对。每个全基因组的Het/Hom和Ti/Tv比率分别为1.35至1.52和2.07至2.08。我们将每个基因组与现有的临床数据库ClinVar、SNPedia、Cosmic进行比较和分析，分别发现20至25条、269至288条、7至12条单核苷酸多态性（SNP）记录。哈萨克族参考基因组序列的可用性为研究序列变异的性质，特别是单核苷酸多态性提供了基础。