Departament de Genètica, Microbiologia i Estadística, and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Catalonia, Spain.
Methods Mol Biol. 2023;2680:1-27. doi: 10.1007/978-1-0716-3275-8_1.
Transcriptomic data (obtained from RNA sequencing) has become a very powerful source of information to reconstruct the evolutionary relationships among organisms. Although phylogenetic inference using transcriptomes retains the same core steps as when working with few molecular markers (viz., nucleic acid extraction and sequencing, sequence treatment, and tree inference), all of them show significant differences. First, the needed quantity and quality of the extracted RNA has to be very high. Although this may not represent a challenge when working with certain organisms, it may well be a headache with others, especially for those with small body sizes. Second, the tremendous increase in the quantity of sequences obtained requires a high computational power for both treating the sequences and inferring the subsequent phylogenies. This means that transcriptomic data can no longer be analyzed using personal computers nor local programs with a graphical interface. This, in turn, implies the requirement of an increased set of bioinformatic skills from the researchers. Finally, the genomic peculiarities of each group of organisms, such as the level of heterozygosity or the percentage of base composition, also need to be considered when inferring phylogenies using transcriptomic data.
转录组数据(通过 RNA 测序获得)已经成为重建生物进化关系的非常强大的信息来源。尽管使用转录组进行系统发育推断保留了与使用少数分子标记(即核酸提取和测序、序列处理和树推断)相同的核心步骤,但所有这些步骤都存在显著差异。首先,需要提取的 RNA 的数量和质量必须非常高。虽然这在处理某些生物时可能不会构成挑战,但对于其他生物来说,这可能是一个令人头疼的问题,尤其是对于那些体型较小的生物。其次,获得的序列数量的巨大增加需要大量的计算能力来处理序列和推断随后的系统发育。这意味着,转录组数据不再可以使用个人计算机或带有图形界面的本地程序进行分析。这反过来又意味着研究人员需要掌握更多的生物信息学技能。最后,在使用转录组数据推断系统发育时,还需要考虑每个生物群体的基因组特殊性,例如杂合度水平或碱基组成百分比。