Qian Yu, Kehr Birte, Halldórsson Bjarni V
Bioinformatics Research Center, Aarhus University , Aarhus , Denmark.
deCODE genetics/Amgen , Reykjavík , Iceland.
PeerJ. 2015 Sep 22;3:e1269. doi: 10.7717/peerj.1269. eCollection 2015.
Alu elements are sequences of approximately 300 basepairs that together comprise more than 10% of the human genome. Due to their recent origin in primate evolution some Alu elements are polymorphic in humans, present in some individuals while absent in others. We present PopAlu, a tool to detect polymorphic Alu elements on a population scale from paired-end sequencing data. PopAlu uses read pair distance and orientation as well as split reads to identify the location and precise breakpoints of polymorphic Alus. Genotype calling enables us to differentiate between homozygous and heterozygous carriers, making the output of PopAlu suitable for use in downstream analyses such as genome-wide association studies (GWAS). We show on a simulated dataset that PopAlu calls Alu elements inserted and deleted with respect to a reference genome with high accuracy and high precision. Our analysis of real data of a human trio from the 1000 Genomes Project confirms that PopAlu is able to produce highly accurate genotype calls. To our knowledge, PopAlu is the first tool that identifies polymorphic Alu elements from multiple individuals simultaneously, pinpoints the precise breakpoints and calls genotypes with high accuracy.
Alu元件是大约300个碱基对的序列,它们共同构成了人类基因组的10%以上。由于它们在灵长类进化中的近期起源,一些Alu元件在人类中是多态性的,在一些个体中存在而在另一些个体中不存在。我们展示了PopAlu,这是一种从双末端测序数据中在群体规模上检测多态性Alu元件的工具。PopAlu利用读段对距离和方向以及拆分读段来识别多态性Alu元件的位置和精确断点。基因型分型使我们能够区分纯合子和杂合子携带者,使得PopAlu的输出适用于下游分析,如全基因组关联研究(GWAS)。我们在一个模拟数据集上表明,PopAlu能够高精度和高精准度地调用相对于参考基因组插入和缺失的Alu元件。我们对来自千人基因组计划的一个人类三人组的真实数据的分析证实,PopAlu能够产生高度准确的基因型分型。据我们所知,PopAlu是第一个同时从多个个体中识别多态性Alu元件、精确确定断点并高精度调用基因型的工具。