Suppr超能文献

或许:基于下一代测序数据的 Paired-End 短读段 HAP 分型。

PERHAPS: Paired-End short Reads-based HAPlotyping from next-generation Sequencing data.

机构信息

Department of Global Health, School of Public Health, Peking University, Beijing, China.

Genetics and Animal Breeding Group, School of Pharmacy, University of Camerino, Italy.

出版信息

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa320.

Abstract

The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.

摘要

从短读长、配对末端下一代测序 (NGS) 数据中直接调用单倍型的新方法,即 PERHAPS(基于配对末端短读的单倍型推断下一代测序数据)。为了验证该方法,我们考虑了 APOE 经典多态性(*1/*2/*3/4),因为它代表了由两个单核苷酸多态性 (SNP) 的单倍型组合引起的功能多态性的最佳范例之一。我们利用从多民族英国生物银行 (UKBB,N=48,855) 获得的全外显子组测序 (WES) 和 SNP 芯片数据。通过应用 PERHAPS,根据其 FASTQ 标签拼接配对末端读取,我们提取了单倍型数据及其频率和个体二倍型。通过 WES 直接调用的二倍型与 SNP 芯片数据的统计预相和导入生成的二倍型之间的一致性率极高 (>99%),无论是按 SNP 芯片基因型批次分层还是按自我报告的族群分层。Hardy-Weinberg 平衡检验和获得的单倍型频率与 1000 基因组计划提供的频率的比较进一步支持了 PERHAPS 的可靠性。值得注意的是,我们能够确定 UKBB 中的两个无关非洲个体中存在罕见的 APOE1 单倍型,支持其在非洲约鲁巴人群中存在可观的频率(约 0.5%)。尽管承认存在一些技术缺陷,但 PERHAPS 代表了一种新颖而简单的方法,将部分克服从基于短读的测序中直接调用单倍型的局限性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验