使用miniQuant改进基因异构体定量分析。

Improving gene isoform quantification with miniQuant.

作者信息

Li Haoran, Wang Dingjie, Gao Qi, Tan Puwen, Wang Yunhao, Cai Xiaoyu, Li Aifu, Zhao Yue, Thurman Andrew L, Malekpour Seyed Amir, Zhang Ying, Sala Roberta, Cipriano Andrea, Wei Chia-Lin, Sebastiano Vittorio, Song Chi, Zhang Nancy R, Au Kin Fai

机构信息

Gilbert S. Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.

Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.

出版信息

Nat Biotechnol. 2025 Jun 3. doi: 10.1038/s41587-025-02633-9.

DOI:10.1038/s41587-025-02633-9

PMID:40461779

Abstract

RNA sequencing has been widely applied for gene isoform quantification, but limitations exist in quantifying isoforms of complex genes accurately, especially for short reads. Here we identify genes that are difficult to quantify accurately with short reads and illustrate the information benefit of using long reads to quantify these regions. We present miniQuant, which ranks genes with quantification errors caused by the ambiguity of read alignments and integrates the complementary strengths of long reads and short reads with optimal combination in a gene- and data-specific manner to achieve more accurate quantification. These results are supported by rigorous mathematical proofs, validated with a wide range of simulation data, experimental validations and more than 17,000 public datasets from GTEx, TCGA and ENCODE consortia. We demonstrate miniQuant can uncover isoform switches during the differentiation of human embryonic stem cells to pharyngeal endoderm and primordial germ cell-like cells.

摘要

RNA测序已被广泛应用于基因异构体定量分析，但在准确量化复杂基因的异构体方面存在局限性，尤其是对于短读长数据。在这里，我们识别了难以用短读长数据准确量化的基因，并说明了使用长读长数据来量化这些区域的信息优势。我们提出了miniQuant，它对因读段比对歧义而导致定量误差的基因进行排序，并以基因和数据特异性的方式将长读长和短读长的互补优势进行最佳组合，以实现更准确的定量分析。这些结果得到了严格数学证明的支持，并通过广泛的模拟数据、实验验证以及来自GTEx、TCGA和ENCODE联盟的17000多个公共数据集进行了验证。我们证明miniQuant可以揭示人类胚胎干细胞分化为咽内胚层和原始生殖细胞样细胞过程中的异构体转换。