Branco Gabriela Pereira, Valieris Renan, Povoa Lucas Venezian, Araújo Luiza Ferreira de, Fernandes Gustavo Ribeiro, Souza Jorge Estefano Santana de, Amorim Maria Galli de, Ferreira Elisa Napolitano E, Silva Israel Tojal da, Nunes Diana Noronha, Dias-Neto Emmanuel
A.C.Camargo Cancer Center, Laboratório de Genômica Médica, CIPE, São Paulo, SP, Brazil.
A.C.Camargo Cancer Center, Laboratório de Biologia Computacional, São Paulo, SP, Brazil.
Genet Mol Biol. 2020 Apr 27;43(2):e20180351. doi: 10.1590/1678-4685-GMB-2018-0351. eCollection 2020.
Next-generation sequencing (NGS) platforms allow the analysis of hundreds of millions of molecules in a single sequencing run, revolutionizing many research areas. NGS-based microRNA studies enable expression quantification in unprecedented scale without the limitations of closed-platforms. Yet, whereas a massive amount of data produced by these platforms is available, comparisons of quantification/discovery capabilities between platforms are still lacking. Here we compare two NGS-platforms: SOLiD and PGM, by evaluating their microRNA identification/quantification capabilities using two breast-derived cell-lines. A high expression correlation (R2 > 0.9) was achieved, encompassing 97% of the miRNAs, and the few discrepancies in miRNA counts were attributable to molecules that have very low expression. Quantification divergences indicative of artefactual representation were seen for 14 miRNAs (higher in SOLiD-reads) and another 10 miRNAs more abundant in PGM-data. An inspection of these revealed an increased and statistically significant count of uracyls and uracyl-stretches for PGM-enriched miRNAs, compared to SOLiD and to the miRBase. In parallel, adenines and adenine-stretches were enriched for SOLiDderived miRNA reads. We conclude that, whereas both platforms are overall consistent and can be used interchangeably for microRNA expression studies, particular sequence features appear to be indicative of specific platform bias, and their presence in microRNAs should be considered for database-analyses.
新一代测序(NGS)平台能够在单次测序运行中对数亿个分子进行分析,从而彻底改变了许多研究领域。基于NGS的微小RNA研究能够以前所未有的规模进行表达定量,而不受封闭平台的限制。然而,尽管这些平台产生了大量数据,但不同平台之间的定量/发现能力比较仍然缺乏。在这里,我们通过使用两种乳腺来源的细胞系评估其微小RNA识别/定量能力,比较了两种NGS平台:SOLiD和PGM。实现了高表达相关性(R2 > 0.9),涵盖了97%的微小RNA,微小RNA计数中的少数差异可归因于表达水平极低的分子。在14个微小RNA中观察到了指示人为代表性的定量差异(在SOLiD读数中更高),在PGM数据中另外10个微小RNA更为丰富。对这些的检查发现,与SOLiD和miRBase相比,PGM富集的微小RNA中尿嘧啶和尿嘧啶延伸的计数增加且具有统计学意义。同时,腺嘌呤和腺嘌呤延伸在SOLiD衍生的微小RNA读数中富集。我们得出结论,虽然两个平台总体上是一致的,并且可以在微小RNA表达研究中互换使用,但特定的序列特征似乎指示了特定的平台偏差,在数据库分析中应考虑它们在微小RNA中的存在。