使用多研究衍生数据集对原发性 T 细胞中的特定基因座人类内源性逆转录病毒 (HERV) 转录特征进行分析时的混杂因素。

Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets.

机构信息

Leibniz Institute of Virology (LIV), Hamburg, Germany.

Institute for Infection Research and Vaccine Development, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.

出版信息

BMC Med Genomics. 2023 Apr 3;16(1):68. doi: 10.1186/s12920-023-01486-y.

DOI:10.1186/s12920-023-01486-y

PMID:37013607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10068191/

Abstract

BACKGROUND

Human endogenous retroviruses (HERV) are repetitive sequence elements and a substantial part of the human genome. Their role in development has been well documented and there is now mounting evidence that dysregulated HERV expression also contributes to various human diseases. While research on HERV elements has in the past been hampered by their high sequence similarity, advanced sequencing technology and analytical tools have empowered the field. For the first time, we are now able to undertake locus-specific HERV analysis, deciphering expression patterns, regulatory networks and biological functions of these elements. To do so, we inevitable rely on omics datasets available through the public domain. However, technical parameters inevitably differ, making inter-study analysis challenging. We here address the issue of confounding factors for profiling locus-specific HERV transcriptomes using datasets from multiple sources.

METHODS

We collected RNAseq datasets of CD4 and CD8 primary T cells and extracted HERV expression profiles for 3220 elements, resembling most intact, near full-length proviruses. Looking at sequencing parameters and batch effects, we compared HERV signatures across datasets and determined permissive features for HERV expression analysis from multiple-source data.

RESULTS

We could demonstrate that considering sequencing parameters, sequencing-depth is most influential on HERV signature outcome. Sequencing samples deeper broadens the spectrum of expressed HERV elements. Sequencing mode and read length are secondary parameters. Nevertheless, we find that HERV signatures from smaller RNAseq datasets do reliably reveal most abundantly expressed HERV elements. Overall, HERV signatures between samples and studies overlap substantially, indicating a robust HERV transcript signature in CD4 and CD8 T cells. Moreover, we find that measures of batch effect reduction are critical to uncover genic and HERV expression differences between cell types. After doing so, differences in the HERV transcriptome between ontologically closely related CD4 and CD8 T cells became apparent.

CONCLUSION

In our systematic approach to determine sequencing and analysis parameters for detection of locus-specific HERV expression, we provide evidence that analysis of RNAseq datasets from multiple studies can aid confidence of biological findings. When generating de novo HERV expression datasets we recommend increased sequence depth ( > = 100 mio reads) compared to standard genic transcriptome pipelines. Finally, batch effect reduction measures need to be implemented to allow for differential expression analysis.

摘要

背景

人类内源性逆转录病毒（HERV）是重复序列元件，也是人类基因组的重要组成部分。它们在发育中的作用已得到充分证实，现在有越来越多的证据表明，HERV 表达失调也与各种人类疾病有关。虽然过去对 HERV 元件的研究受到其高度序列相似性的阻碍，但先进的测序技术和分析工具为该领域提供了支持。我们现在第一次能够进行特定基因座的 HERV 分析，解析这些元件的表达模式、调控网络和生物学功能。为此，我们不可避免地依赖于公共领域提供的组学数据集。然而，技术参数不可避免地存在差异，使得研究间的分析具有挑战性。在这里，我们针对使用来自多个来源的数据集进行特定基因座 HERV 转录组分析时的混杂因素问题进行了探讨。

方法

我们收集了 CD4 和 CD8 原代 T 细胞的 RNAseq 数据集，并提取了 3220 个元件的 HERV 表达谱，这些元件类似于大多数完整的、近乎全长的前病毒。通过观察测序参数和批次效应，我们比较了来自多个数据集的 HERV 特征，并确定了来自多源数据的 HERV 表达分析的许可特征。

结果

我们可以证明，考虑到测序参数，测序深度对 HERV 特征的结果影响最大。增加测序样本的深度会扩大表达 HERV 元件的范围。测序模式和读长是次要参数。然而，我们发现来自较小的 RNAseq 数据集的 HERV 特征确实可靠地揭示了最丰富表达的 HERV 元件。总体而言，样本和研究之间的 HERV 特征重叠程度很高，表明 CD4 和 CD8 T 细胞中存在稳健的 HERV 转录特征。此外，我们发现批次效应减少措施对于揭示细胞类型之间的基因和 HERV 表达差异至关重要。这样做之后，在在具有密切亲缘关系的 CD4 和 CD8 T 细胞之间的 HERV 转录本之间的差异变得明显。

结论

在我们确定检测特定基因座 HERV 表达的测序和分析参数的系统方法中，我们提供了证据表明，来自多个研究的 RNAseq 数据集的分析可以帮助确认生物学发现的可信度。当生成新的 HERV 表达数据集时，我们建议与标准基因转录组分析管道相比，增加测序深度（≥1 亿个reads）。最后，需要实施批次效应减少措施，以允许进行差异表达分析。

相似文献

Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets.使用多研究衍生数据集对原发性 T 细胞中的特定基因座人类内源性逆转录病毒 (HERV) 转录特征进行分析时的混杂因素。

BMC Med Genomics. 2023 Apr 3;16(1):68. doi: 10.1186/s12920-023-01486-y.

HIV-1 Infection of Primary CD4 T Cells Regulates the Expression of Specific Human Endogenous Retrovirus HERV-K (HML-2) Elements.原发性CD4 T细胞的HIV-1感染调节特定人类内源性逆转录病毒HERV-K（HML-2）元件的表达。

J Virol. 2017 Dec 14;92(1). doi: 10.1128/JVI.01507-17. Print 2018 Jan 1.

Comprehensive Analysis of HERV Transcriptome in HIV+ Cells: Absence of HML2 Activation and General Downregulation of Individual HERV Loci.HIV+ 细胞中的 HERV 转录组综合分析：无 HML2 激活和个体 HERV 基因座的普遍下调。

Viruses. 2020 Apr 23;12(4):481. doi: 10.3390/v12040481.

A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.一种用于评估人类种群中多态性人类内源性逆转录病毒-K 的全基因组分布的计算框架。

PLoS Comput Biol. 2019 Mar 28;15(3):e1006564. doi: 10.1371/journal.pcbi.1006564. eCollection 2019 Mar.

Regulation of the human endogenous retrovirus K (HML-2) transcriptome by the HIV-1 Tat protein.人类内源性逆转录病毒 K（HML-2）转录本受 HIV-1 Tat 蛋白的调控。

J Virol. 2014 Aug;88(16):8924-35. doi: 10.1128/JVI.00556-14. Epub 2014 May 28.

Human endogenous retrovirus-K mRNA expression and genomic alignment data in hepatoblastoma.肝母细胞瘤中的人类内源性逆转录病毒-K mRNA表达及基因组比对数据

Data Brief. 2020 Jun 18;31:105895. doi: 10.1016/j.dib.2020.105895. eCollection 2020 Aug.

HIV-1 interacts with human endogenous retrovirus K (HML-2) envelopes derived from human primary lymphocytes.HIV-1 与源自人原代淋巴细胞的人类内源性逆转录病毒 K（HML-2）包膜相互作用。

J Virol. 2014 Jun;88(11):6213-23. doi: 10.1128/JVI.00669-14. Epub 2014 Mar 19.

HERV-E-mediated modulation of PLA2G4A transcription in urothelial carcinoma.人类内源性逆转录病毒 E 介导的尿路上皮癌中 PLA2G4A 转录的调节。

PLoS One. 2012;7(11):e49341. doi: 10.1371/journal.pone.0049341. Epub 2012 Nov 7.

Locus-Specific Characterization of Human Endogenous Retrovirus Expression in Prostate, Breast, and Colon Cancers.前列腺癌、乳腺癌和结肠癌中人类内源性逆转录病毒表达的特定部位特征分析。

Cancer Res. 2021 Jul 1;81(13):3449-3460. doi: 10.1158/0008-5472.CAN-20-3975. Epub 2021 May 3.

Screening and Identification of Human Endogenous Retrovirus-K mRNAs for Breast Cancer Through Integrative Analysis of Multiple Datasets.通过多数据集综合分析筛选和鉴定乳腺癌相关的人类内源性逆转录病毒-K mRNA

Front Oncol. 2022 Feb 16;12:820883. doi: 10.3389/fonc.2022.820883. eCollection 2022.

引用本文的文献

Expression of LTR and LINE1 transposable elements defines atypical teratoid/rhabdoid tumor subtypes.长末端重复序列（LTR）和长散在核元件1（LINE1）转座元件的表达定义了非典型畸胎样/横纹肌样瘤亚型。

Acta Neuropathol Commun. 2025 Jul 22;13(1):159. doi: 10.1186/s40478-025-02078-w.

bioRxiv. 2025 May 14:2025.05.13.653713. doi: 10.1101/2025.05.13.653713.

Targeted Variant Assessments of Human Endogenous Retroviral Regions in Whole Genome Sequencing Data Reveal Retroviral Variants Associated with Papillary Thyroid Cancer.全基因组测序数据中人类内源性逆转录病毒区域的靶向变异评估揭示了与乳头状甲状腺癌相关的逆转录病毒变异。

Microorganisms. 2024 Nov 27;12(12):2435. doi: 10.3390/microorganisms12122435.

Cell-Specific Transposable Element and Gene Expression Analysis Across Systemic Lupus Erythematosus Phenotypes.系统性红斑狼疮各表型中细胞特异性转座元件与基因表达分析

ACR Open Rheumatol. 2024 Nov;6(11):769-779. doi: 10.1002/acr2.11713. Epub 2024 Aug 14.

CancerHERVdb: Human Endogenous Retrovirus (HERV) Expression Database for Human Cancer Accelerates Studies of the Retrovirome and Predictions for HERV-Based Therapies.CancerHERVdb：人类癌症内源性逆转录病毒 (HERV) 表达数据库，可加速逆转录病毒组研究和基于 HERV 的治疗预测。

J Virol. 2023 Jun 29;97(6):e0005923. doi: 10.1128/jvi.00059-23. Epub 2023 May 31.

本文引用的文献

Transposable elements and Alzheimer's disease pathogenesis.转座元件与阿尔茨海默病发病机制

Trends Neurosci. 2023 Mar;46(3):170-172. doi: 10.1016/j.tins.2022.12.003. Epub 2022 Dec 30.

Transposon control as a checkpoint for tissue regeneration.转座子控制作为组织再生的检查点。

Development. 2022 Nov 15;149(22). doi: 10.1242/dev.191957. Epub 2022 Nov 28.

The landscape of hervRNAs transcribed from human endogenous retroviruses across human body sites.人类内源性逆转录病毒在人体不同部位转录的 hervRNA 景观。

Genome Biol. 2022 Nov 3;23(1):231. doi: 10.1186/s13059-022-02804-w.

Widespread expression of the ancient HERV-K (HML-2) provirus group in normal human tissues.正常人体组织中广泛表达古老的 HERV-K（HML-2）前病毒群。

PLoS Biol. 2022 Oct 18;20(10):e3001826. doi: 10.1371/journal.pbio.3001826. eCollection 2022 Oct.

Transcriptome Analysis of Human Endogenous Retroviruses at Locus-Specific Resolution in Non-Small Cell Lung Cancer.非小细胞肺癌中位点特异性分辨率下人类内源性逆转录病毒的转录组分析

Cancers (Basel). 2022 Sep 13;14(18):4433. doi: 10.3390/cancers14184433.

Transcriptional and reverse transcriptional regulation of host genes by human endogenous retroviruses in cancers.人类内源性逆转录病毒在癌症中对宿主基因的转录和逆转录调控

Front Microbiol. 2022 Jul 19;13:946296. doi: 10.3389/fmicb.2022.946296. eCollection 2022.

Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality.基于机器学习的 RNA-seq 数据质量自动评估进行批次效应检测和校正。

BMC Bioinformatics. 2022 Jul 14;23(Suppl 6):279. doi: 10.1186/s12859-022-04775-y.

Endogenous Retroviruses (ERVs): Does RLR (RIG-I-Like Receptors)-MAVS Pathway Directly Control Senescence and Aging as a Consequence of ERV De-Repression?内源性逆转录病毒 (ERVs)：RLR (RIG-I 样受体)-MAVS 通路是否直接通过 ERV 去抑制的后果来控制衰老和老化？

Front Immunol. 2022 Jun 9;13:917998. doi: 10.3389/fimmu.2022.917998. eCollection 2022.

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values.HarmonizR 支持对独立蛋白质组学数据集进行数据协调，并适当处理缺失值。

Nat Commun. 2022 Jun 20;13(1):3523. doi: 10.1038/s41467-022-31007-x.

HIV UTR, LTR, and Epigenetic Immunity.HIV UTR、LTR 和表观遗传免疫。

Viruses. 2022 May 18;14(5):1084. doi: 10.3390/v14051084.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用多研究衍生数据集对原发性 T 细胞中的特定基因座人类内源性逆转录病毒 (HERV) 转录特征进行分析时的混杂因素。

Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献