Suppr超能文献

利用长读测序技术鉴定和表征隐匿性人类特异性 LINE-1 插入。

Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology.

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA.

Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA.

出版信息

Nucleic Acids Res. 2020 Feb 20;48(3):1146-1163. doi: 10.1093/nar/gkz1173.

Abstract

Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes.

摘要

长散布元件-1(LINE-1)反转录转座导致个体间和个体内遗传变异,偶尔会导致人类遗传疾病。已经开发了各种策略来从短读长全基因组测序(WGS)数据中鉴定人类特异性 LINE-1(L1Hs)插入;然而,它们在检测复杂重复基因组区域中的插入方面存在局限性。在这里,我们开发了一种计算工具(PALMER),并使用它在 NA12878 基准基因组中鉴定了 203 个非参考 L1Hs 插入。使用 PacBio 长读测序数据,我们鉴定了先前短读研究中缺失的 L1Hs 插入(90/203)。大约 81%(73/90)的 L1Hs 插入位于参考组装中的内源性 LINE-1 序列内,对独特的断点连接序列的分析表明,这些 L1Hs 插入中的 63%(57/90)可以在 1000 基因组计划序列中进行基因分型。此外,我们观察到,单细胞 WGS 实验中遇到的扩增偏差导致四个个体的 NA12878 细胞之间 L1Hs 插入检测率存在广泛差异;低扩增将检测限制在 203 个插入中的 32%(65/203),而过度扩增会增加假阳性调用。总之,这些数据表明,标准的短读测序方法经常会错过 L1Hs 插入,而长读测序方法可以显著提高对个体基因组中存在的 L1Hs 插入的检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/badd/7026601/5ec3e1ff0de9/gkz1173fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验