Nugmanov G A, Komkov A Y, Saliutina M V, Minervina A A, Lebedev Y B, Mamedov I Z
Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997 Russia.
Mol Biol (Mosk). 2019 Jan-Feb;53(1):154-165. doi: 10.1134/S0026898419010117.
Retroelements are considered as one of the important sources of genomic variability in modern humans. It is known that transposition activity of retroelements in germline cells generates new insertions in various genomic loci and sometimes results in genetic diseases. Retroelements activity in somatic cells is restricted by different cellular mechanisms; however, there is an evidence for it in some tissue types. Somatic insertions can trigger tumorigenesis or participate in normal functioning such as generation of neurons' plasticity. In spite of the rapid development of high-throughput sequencing methods a confident detection of somatic insertions is still quite a challenging task. That, in part, is due to the absence of adequate bioinformatic tools for the analysis of sequencing data. Here, we propose an advanced computational pipeline for the identification of somatic insertions in datasets generated by selective amplification and high-throughput sequencing of genomic regions flanking insertions of AluYa5. Particular attention is paid for the identification of various artifacts arising in course of library preparation and the parameters for their filtration. Pipeline sensitivity is confirmed by in silico experiments with artificial datasets. Using the proposed pipeline we remove at least 80% of artifacts and preserve 75% of potentially somatic insertions. The approaches used in this work can be applied for the study of other mobile elements insertion variability.
逆转录元件被认为是现代人类基因组变异的重要来源之一。已知生殖细胞中逆转录元件的转座活性会在各种基因组位点产生新的插入,有时会导致遗传疾病。体细胞中逆转录元件的活性受到不同细胞机制的限制;然而,在某些组织类型中存在相关证据。体细胞插入可引发肿瘤发生或参与正常功能,如神经元可塑性的产生。尽管高通量测序方法迅速发展,但可靠地检测体细胞插入仍然是一项颇具挑战性的任务。部分原因在于缺乏用于分析测序数据的适当生物信息学工具。在此,我们提出一种先进的计算流程,用于识别通过对AluYa5插入侧翼基因组区域进行选择性扩增和高通量测序生成的数据集中的体细胞插入。特别关注文库制备过程中出现的各种假象的识别及其过滤参数。通过对人工数据集的计算机模拟实验证实了该流程的敏感性。使用所提出的流程,我们至少去除了80%的假象,并保留了75%的潜在体细胞插入。这项工作中使用的方法可应用于研究其他移动元件的插入变异性。