Suppr超能文献

使用多研究衍生数据集对原发性 T 细胞中的特定基因座人类内源性逆转录病毒 (HERV) 转录特征进行分析时的混杂因素。

Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets.

机构信息

Leibniz Institute of Virology (LIV), Hamburg, Germany.

Institute for Infection Research and Vaccine Development, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.

出版信息

BMC Med Genomics. 2023 Apr 3;16(1):68. doi: 10.1186/s12920-023-01486-y.

Abstract

BACKGROUND

Human endogenous retroviruses (HERV) are repetitive sequence elements and a substantial part of the human genome. Their role in development has been well documented and there is now mounting evidence that dysregulated HERV expression also contributes to various human diseases. While research on HERV elements has in the past been hampered by their high sequence similarity, advanced sequencing technology and analytical tools have empowered the field. For the first time, we are now able to undertake locus-specific HERV analysis, deciphering expression patterns, regulatory networks and biological functions of these elements. To do so, we inevitable rely on omics datasets available through the public domain. However, technical parameters inevitably differ, making inter-study analysis challenging. We here address the issue of confounding factors for profiling locus-specific HERV transcriptomes using datasets from multiple sources.

METHODS

We collected RNAseq datasets of CD4 and CD8 primary T cells and extracted HERV expression profiles for 3220 elements, resembling most intact, near full-length proviruses. Looking at sequencing parameters and batch effects, we compared HERV signatures across datasets and determined permissive features for HERV expression analysis from multiple-source data.

RESULTS

We could demonstrate that considering sequencing parameters, sequencing-depth is most influential on HERV signature outcome. Sequencing samples deeper broadens the spectrum of expressed HERV elements. Sequencing mode and read length are secondary parameters. Nevertheless, we find that HERV signatures from smaller RNAseq datasets do reliably reveal most abundantly expressed HERV elements. Overall, HERV signatures between samples and studies overlap substantially, indicating a robust HERV transcript signature in CD4 and CD8 T cells. Moreover, we find that measures of batch effect reduction are critical to uncover genic and HERV expression differences between cell types. After doing so, differences in the HERV transcriptome between ontologically closely related CD4 and CD8 T cells became apparent.

CONCLUSION

In our systematic approach to determine sequencing and analysis parameters for detection of locus-specific HERV expression, we provide evidence that analysis of RNAseq datasets from multiple studies can aid confidence of biological findings. When generating de novo HERV expression datasets we recommend increased sequence depth ( > = 100 mio reads) compared to standard genic transcriptome pipelines. Finally, batch effect reduction measures need to be implemented to allow for differential expression analysis.

摘要

背景

人类内源性逆转录病毒(HERV)是重复序列元件,也是人类基因组的重要组成部分。它们在发育中的作用已得到充分证实,现在有越来越多的证据表明,HERV 表达失调也与各种人类疾病有关。虽然过去对 HERV 元件的研究受到其高度序列相似性的阻碍,但先进的测序技术和分析工具为该领域提供了支持。我们现在第一次能够进行特定基因座的 HERV 分析,解析这些元件的表达模式、调控网络和生物学功能。为此,我们不可避免地依赖于公共领域提供的组学数据集。然而,技术参数不可避免地存在差异,使得研究间的分析具有挑战性。在这里,我们针对使用来自多个来源的数据集进行特定基因座 HERV 转录组分析时的混杂因素问题进行了探讨。

方法

我们收集了 CD4 和 CD8 原代 T 细胞的 RNAseq 数据集,并提取了 3220 个元件的 HERV 表达谱,这些元件类似于大多数完整的、近乎全长的前病毒。通过观察测序参数和批次效应,我们比较了来自多个数据集的 HERV 特征,并确定了来自多源数据的 HERV 表达分析的许可特征。

结果

我们可以证明,考虑到测序参数,测序深度对 HERV 特征的结果影响最大。增加测序样本的深度会扩大表达 HERV 元件的范围。测序模式和读长是次要参数。然而,我们发现来自较小的 RNAseq 数据集的 HERV 特征确实可靠地揭示了最丰富表达的 HERV 元件。总体而言,样本和研究之间的 HERV 特征重叠程度很高,表明 CD4 和 CD8 T 细胞中存在稳健的 HERV 转录特征。此外,我们发现批次效应减少措施对于揭示细胞类型之间的基因和 HERV 表达差异至关重要。这样做之后,在在具有密切亲缘关系的 CD4 和 CD8 T 细胞之间的 HERV 转录本之间的差异变得明显。

结论

在我们确定检测特定基因座 HERV 表达的测序和分析参数的系统方法中,我们提供了证据表明,来自多个研究的 RNAseq 数据集的分析可以帮助确认生物学发现的可信度。当生成新的 HERV 表达数据集时,我们建议与标准基因转录组分析管道相比,增加测序深度(≥1 亿个reads)。最后,需要实施批次效应减少措施,以允许进行差异表达分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验