Aguirre-Carvajal Kevin, Munteanu Cristian R, Armijos-Jaramillo Vinicio
Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, Campus Elviña s/n, 15071 Coruña, Spain.
Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito 170513, Ecuador.
Biology (Basel). 2024 Jun 26;13(7):469. doi: 10.3390/biology13070469.
Horizontal gene transfer (HGT) is a widely acknowledged phenomenon in prokaryotes for generating genetic diversity. However, the impact of this process in eukaryotes, particularly interdomain HGT, is a topic of debate. Although there have been observed biases in interdomain HGT detection, little exploration has been conducted on the effects of imbalanced databases. In our study, we conducted experiments to assess how different databases affect the detection of interdomain HGT using proteomes from the Pezizomycotina fungal subphylum as our focus group. Our objective was to simulate the database imbalance commonly found in public biological databases, where bacterial and eukaryotic sequences are unevenly represented, and demonstrate that an increase in uploaded eukaryotic sequences leads to a decrease in predicted HGTs. For our experiments, four databases with varying proportions of eukaryotic sequences but consistent proportions of bacterial sequences were utilized. We observed a significant reduction in detected interdomain HGT candidates as the proportion of eukaryotes increased within the database. Our data suggest that the imbalance in databases bias the interdomain HGT detection and highlights challenges associated with confirming the presence of interdomain HGT among Pezizomycotina fungi and potentially other groups within Eukarya.
水平基因转移(HGT)是原核生物中一种广泛认可的产生遗传多样性的现象。然而,这一过程在真核生物中的影响,尤其是跨域HGT,仍是一个有争议的话题。尽管在跨域HGT检测中已观察到偏差,但对于数据库不平衡的影响却鲜有探索。在我们的研究中,我们进行了实验,以评估不同数据库如何影响跨域HGT的检测,我们将盘菌亚门真菌的蛋白质组作为重点研究对象。我们的目标是模拟公共生物数据库中常见的数据库不平衡情况,即细菌和真核生物序列的代表性不均衡,并证明上传的真核生物序列增加会导致预测的HGT减少。在我们的实验中,使用了四个真核生物序列比例不同但细菌序列比例一致的数据库。我们观察到,随着数据库中真核生物比例的增加,检测到的跨域HGT候选序列显著减少。我们的数据表明,数据库的不平衡会使跨域HGT检测产生偏差,并凸显了在盘菌亚门真菌以及真核生物中其他潜在类群中确认跨域HGT存在所面临的挑战。