Turudić Ante, Liber Zlatko, Grdiša Martina, Jakše Jernej, Varga Filip, Šatović Zlatko
Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska c. 25, 10000 Zagreb, Croatia.
Faculty of Agriculture, University of Zagreb, Svetošimunska c. 25, 10000 Zagreb, Croatia.
Plants (Basel). 2023 Jan 5;12(2):254. doi: 10.3390/plants12020254.
The development of bioinformatic solutions is guided by biological knowledge of the subject. In some cases, we use unambiguous biological models, while in others we rely on assumptions. A commonly used assumption for genomes is that related species have similar genome sequences. This is even more obvious in the case of chloroplast genomes due to their slow evolution. We investigated whether the lengths of complete chloroplast sequences are closely related to the taxonomic proximity of the species. The study was performed using all available sequences from the asterid and rosid clades. In general, chloroplast length distributions are narrow at both the family and genus levels. In addition, clear biological explanations have already been reported for families and genera that exhibit particularly wide distributions. The main factors responsible for the length variations are parasitic life forms, IR loss, IR expansions and contractions, and polyphyly. However, the presence of outliers in the distribution at the genus level is a strong indication of possible inaccuracies in sequence assembly.
生物信息学解决方案的开发以该学科的生物学知识为指导。在某些情况下,我们使用明确的生物学模型,而在其他情况下,我们依赖假设。对于基因组,一个常用的假设是相关物种具有相似的基因组序列。由于叶绿体基因组进化缓慢,这种情况在叶绿体基因组中更为明显。我们研究了完整叶绿体序列的长度是否与物种的分类学亲缘关系密切相关。该研究使用了菊分支和蔷薇分支的所有可用序列进行。总体而言,叶绿体长度分布在科和属水平上都很窄。此外,对于那些表现出特别宽分布的科和属,已经有了明确的生物学解释。造成长度变化的主要因素是寄生生活形式、反向重复序列(IR)丢失、IR扩展和收缩以及多系性。然而,属水平分布中异常值的存在强烈表明序列组装可能存在不准确之处。