Suppr超能文献

利用多种工具的一致性基因型识别串联重复序列与复杂人类性状之间关联的实用指南。

A practical guide to identifying associations between tandem repeats and complex human traits using consensus genotypes from multiple tools.

作者信息

Lujumba Ibra, Adam Yagoub, Ziaei Jam Helyaneh, Isewon Itunuoluwa, Monnakgotla Nomakhosazana, Li Yang, Onyido Blessing, Fredrick Kakembo, Adegoke Faith, Emmanuel Jerry, Adeyemi Jumoke, Ibitoye Olajumoke, Owusu-Ansah Samuel, Akanle Matthew Boladele, Joseph Habi, Nsubuga Mike, Galiwango Ronald, Okitwi Martin, Magdalene Namuswe, Walter Odur, Mngadi Zama, Adebiyi Marion, Oyelade Jelili, Nel Melissa, Jjingo Daudi, Gymrek Melissa, Adebiyi Ezekiel

机构信息

The African Center of Excellence in Bioinformatics and Data Intensive Sciences, Makerere University, Kampala, Uganda.

Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria.

出版信息

Nat Protoc. 2025 Sep 1. doi: 10.1038/s41596-025-01231-y.

Abstract

Tandem repeats (TRs) are highly variable loci in the human genome that are linked to various human phenotypes. Accurate and reliable genotyping of TRs is important in understanding population TR variation dynamics and their effects in TR-trait association studies. In this protocol, we describe how to generate high-quality consensus TR genotypes for population genomics studies. In particular, we detail steps to: (i) perform TR genotyping from short-read whole-genome sequencing data by using the HipSTR, GangSTR, adVNTR and ExpansionHunter tools, (ii) perform quality control checks on TR genotypes by using TRTools and (iii) integrate TR genotypes from different tools by using EnsembleTR. We further discuss how to visualize and investigate TR variation patterns to identify population-specific expansions and perform TR-trait association analyses. We demonstrate the utility of these steps by analyzing a small dataset from the 1000 Genomes Project. In addition, we recapitulate a previously identified association between TR length and gene expression in the African population and provide a generalized discussion on TR analysis and its relevance to identifying complex traits. The expected time for installing the necessary software for each section is ~10 min. The expected run time on the user's desired dataset can vary from hours to days depending on factors such as the size of the data, input parameters and the capacity of the computing infrastructure.

摘要

串联重复序列(TRs)是人类基因组中高度可变的位点,与多种人类表型相关。在理解群体TR变异动态及其在TR-性状关联研究中的作用方面,准确可靠的TR基因分型至关重要。在本方案中,我们描述了如何为群体基因组学研究生成高质量的一致性TR基因型。具体而言,我们详细介绍了以下步骤:(i)使用HipSTR、GangSTR、adVNTR和ExpansionHunter工具从短读长全基因组测序数据中进行TR基因分型,(ii)使用TRTools对TR基因型进行质量控制检查,以及(iii)使用EnsembleTR整合来自不同工具的TR基因型。我们还讨论了如何可视化和研究TR变异模式,以识别特定群体的扩增并进行TR-性状关联分析。我们通过分析千人基因组计划的一个小数据集来展示这些步骤的实用性。此外,我们重现了先前在非洲人群中发现的TR长度与基因表达之间的关联,并对TR分析及其与识别复杂性状的相关性进行了一般性讨论。为每个部分安装必要软件的预期时间约为10分钟。根据数据大小、输入参数和计算基础设施容量等因素,在用户所需数据集上的预期运行时间可能从数小时到数天不等。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验