The Homeland Security Systems Engineering and Development Institute (HSSEDI), operated by The MITRE Corporation, McLean, Virginia, USA.
Department of Advanced Technology, The MITRE Corporation, 7515 Colshire Drive, McLean, Virginia, 22102, USA.
BMC Bioinformatics. 2018 Apr 11;19(1):126. doi: 10.1186/s12859-018-2133-2.
Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms.
The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands.
TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions. It also allows the definition of sequence length and sequence variability of the target region as well as the less variable flanking regions for tailoring to MPS platforms. As shown in this study, TIA can be used to discover identity-linked SNP islands within the human genome, useful for differentiating individuals by targeted resequencing on MPS technologies.
单核苷酸多态性(SNP)位于人类基因组中,已被证明可作为从个体供体中区分 DNA 的身份标记。大规模平行 DNA 测序(MPS)技术和人类基因组 SNP 数据库允许设计一系列与身份相关的靶区,可通过多重和大规模平行方式进行测序。因此,需要工具来利用 SNP 数据库中的基因型信息来发现可在 MPS 平台上评估的基因组靶标。
SNP 岛靶标识别算法(TIA)是作为一个用户可调系统开发的,用于利用数据库中的 SNP 信息。使用 1000 基因组计划 SNP 数据库中的数据,鉴定出包含全球普遍存在的与身份相关的 SNP 并且可响应于 MPS 平台上的靶向重测序的人类基因组区域。算法过滤器用于排除不符合用户可调 SNP 岛靶标特征的靶标区域。为了验证 TIA 在发现人类基因组中这些与身份相关的 SNP 岛的准确性,使用聚合酶链反应从 70 个供体基因组 DNA 样本中扩增 SNP 岛靶标区域。使用 Illumina MiSeq 平台对多重扩增子进行测序,并对所得序列进行 SNP 变异分析。在鉴定的基因组区域中靶向了 166 个假定的与身份相关的 SNP。在所评估的个体基因组靶标中,在 309 个提供个体 SNP 谱区分能力的 SNP 中,有 74 个是先前未定义的 SNP。总体而言,使用一套与身份相关的 SNP 岛中的子集可唯一识别 70 个人的 DNA 样本。
TIA 提供了一种可调节的基因组搜索工具,用于发现可在 SNP 岛区域内的种群频率和 SNP 数量上扩展的靶向基因组区域。它还允许定义靶区的序列长度和序列变异性以及侧翼区的较少变异性,以适应 MPS 平台。如本研究所示,TIA 可用于发现人类基因组中的与身份相关的 SNP 岛,可用于通过 MPS 技术对个体进行靶向重测序来区分个体。