Suppr超能文献

无比对方法预测人类基因组中的新型核线粒体片段(NUMTs)。

Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome.

机构信息

The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, USA.

The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, USA.

出版信息

Gene. 2019 Apr 5;691:141-152. doi: 10.1016/j.gene.2018.12.040. Epub 2019 Jan 8.

Abstract

The nuclear human genome harbors sequences of mitochondrial origin, indicating an ancestral transfer of DNA from the mitogenome. Several Nuclear Mitochondrial Segments (NUMTs) have been detected by alignment-based sequence similarity search, as implemented in the Basic Local Alignment Search Tool (BLAST). Identifying NUMTs is important for the comprehensive annotation and understanding of the human genome. Here we explore the possibility of detecting NUMTs in the human genome by alignment-free sequence similarity search, such as k-mers (k-tuples, k-grams, oligos of length k) distributions. We find that when k=6 or larger, the k-mer approach and BLAST search produce almost identical results, e.g., detect the same set of NUMTs longer than 3 kb. However, when k=5 or k=4, certain signals are only detected by the alignment-free approach, and these may indicate yet unrecognized, and potentially more ancestral NUMTs. We introduce a "Manhattan plot" style representation of NUMT predictions across the genome, which are calculated based on the reciprocal of the Jensen-Shannon divergence between the nuclear and mitochondrial k-mer frequencies. The further inspection of the k-mer-based NUMT predictions however shows that most of them contain long-terminal-repeat (LTR) annotations, whereas BLAST-based NUMT predictions do not. Thus, similarity of the mitogenome to LTR sequences is recognized, which we validate by finding the mitochondrial k-mer distribution closer to those for transposable sequences and specifically, close to some types of LTR.

摘要

人类核基因组中蕴藏着线粒体起源的序列,表明 DNA 从前粒体发生了祖先转移。通过基于比对的序列相似性搜索(如 Basic Local Alignment Search Tool,BLAST),已经检测到了几个核线粒体片段(NUMTs)。识别 NUMTs 对于全面注释和理解人类基因组非常重要。在这里,我们探索了通过无比对序列相似性搜索(如 k-mer(k-元组、k-gram、k 长度的寡核苷酸)分布)检测人类基因组中 NUMTs 的可能性。我们发现,当 k=6 或更大时,k-mer 方法和 BLAST 搜索几乎会产生相同的结果,例如,检测到相同的长于 3kb 的 NUMTs 集合。然而,当 k=5 或 k=4 时,某些信号仅通过无比对方法检测到,这些信号可能表示尚未识别且可能更具祖先性的 NUMTs。我们引入了一种跨基因组 NUMT 预测的“曼哈顿图”样式表示,该表示是基于核和线粒体 k-mer 频率之间的 Jensen-Shannon 散度的倒数计算得出的。然而,对基于 k-mer 的 NUMT 预测的进一步检查表明,它们中的大多数包含长末端重复(LTR)注释,而基于 BLAST 的 NUMT 预测则没有。因此,前粒体序列与 LTR 序列的相似性被识别出来,我们通过发现线粒体 k-mer 分布更接近转座序列,特别是更接近某些类型的 LTR,验证了这一点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验