Suppr超能文献

TRFill:HiFi和Hi-C测序的协同使用能够实现串联重复序列的精确组装,用于群体水平分析。

TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis.

作者信息

Wen Huaming, Yang Jinbao, Zhao Xianjia, Wang Xingbin, Lei Jiawei, Li Yanchun, Du Wenjie, Li Dongxi, Xu Yun, Lonardi Stefano, Pan Weihua

机构信息

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China.

State Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.

出版信息

Genome Biol. 2025 Jul 28;26(1):227. doi: 10.1186/s13059-025-03685-5.

Abstract

The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close the gaps in a draft chromosome-level assembly using exclusively PacBio HiFi and Hi-C data. Experimental results on human centromeres and tomato subtelomeres show that TRFill can improve the completeness and correctness of about two-thirds of the tandem repeats. We also show that the improved completeness of subtelomeric tandem repeats in the tomato pangenome enables a population-level analysis of these complex repeats.

摘要

真核生物基因组中高度重复的内容,包括长串联重复序列、片段重复和着丝粒,使得单倍型解析的基因组组装变得困难。重复序列会在组装中引入缺口或错误连接。我们引入了TRFill,这是一种新颖的算法,它可以仅使用PacBio HiFi和Hi-C数据来填补草图染色体水平组装中的缺口。在人类着丝粒和番茄亚端粒上的实验结果表明,TRFill可以提高约三分之二串联重复序列的完整性和正确性。我们还表明,番茄泛基因组中亚端粒串联重复序列完整性的提高使得能够对这些复杂重复序列进行群体水平的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2b6/12305924/b14fcd6fe47b/13059_2025_3685_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验