Zajac Natalia, Vlachos Ioannis S, Sajibu Sija, Opitz Lennart, Wang Shuoshuo, Chittur Sridar V, Mason Christopher E, Knudtson Kevin L, Ashton John M, Rehrauer Hubert, Aquino Catharine
Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Zurich, Switzerland.
Spatial Technologies Unit, Department of Pathology, HMS Initiative for RNA Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.
Genome Biol. 2025 May 28;26(1):145. doi: 10.1186/s13059-025-03613-7.
Transcriptome sequencing (RNA-seq) is a powerful technology for gene expression profiling. Selection of optimal parameters for cDNA library generation is crucial for acquisition of high-quality data. In this study, we investigate the impact of the amount of RNA and the number of PCR cycles used for sample amplification on the rate of PCR duplication and, in consequence, on the RNA-seq data quality.
For broader applicability, we sequenced the data on four short-read sequencing platforms: Illumina NovaSeq 6000, Illumina NovaSeq X, Element Biosciences AVITI, and Singular Genomics G4. The native Illumina libraries were converted for sequencing on AVITI and G4 to assess the effect of library conversion, containing additional PCR cycles. We find that the rate of PCR duplicates depends on the combined effect of RNA input material and the number of PCR cycles used for amplification. For input amounts lower than 125 ng, 34-96% of reads were discarded via deduplication with the percentage increasing with lower input amount and decreasing with increasing PCR cycles. The reduced read diversity for low input amounts leads to fewer genes detected and increased noise in expression counts.
Data generated with each of the four sequencing platforms presents similar associations between starting material amount and the number of PCR cycles on PCR duplicates, a similar number of detected genes, and comparable gene expression profiles.
转录组测序(RNA-seq)是一种用于基因表达谱分析的强大技术。选择用于生成cDNA文库的最佳参数对于获取高质量数据至关重要。在本研究中,我们调查了RNA量和用于样本扩增的PCR循环数对PCR重复率的影响,进而对RNA-seq数据质量的影响。
为了更广泛的适用性,我们在四个短读长测序平台上对数据进行了测序:Illumina NovaSeq 6000、Illumina NovaSeq X、Element Biosciences AVITI和Singular Genomics G4。将原始的Illumina文库进行转换,以便在AVITI和G4上进行测序,以评估包含额外PCR循环的文库转换的效果。我们发现PCR重复率取决于RNA输入材料和用于扩增的PCR循环数的综合作用。对于低于125 ng的输入量,34%-96%的读段通过重复数据删除被丢弃,该百分比随着输入量降低而增加,随着PCR循环数增加而降低。低输入量导致的读段多样性降低,使得检测到的基因数量减少,表达计数中的噪声增加。
使用四个测序平台中的每一个生成的数据,在起始材料量和PCR重复的PCR循环数之间呈现出相似的关联,检测到的基因数量相似,基因表达谱具有可比性。