Igarashi Yoji, Mori Daisuke, Mitsuyama Susumu, Yoshitake Kazutoshi, Ono Hiroaki, Watanabe Tsuyoshi, Taniuchi Yukiko, Sakami Tomoko, Kuwata Akira, Kobayashi Takanori, Ishino Yoshizumi, Watabe Shugo, Gojobori Takashi, Asakawa Shuichi
Department of Aquatic Bioscience, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo, Tokyo 113-8657, Japan.
Japan Software Management Co, Ltd., Yokohama, Kanagawa 221-0056, Japan.
Proteomes. 2019 Apr 29;7(2):19. doi: 10.3390/proteomes7020019.
Metagenomic data have mainly been addressed by showing the composition of organisms based on a small part of a well-examined genomic sequence, such as ribosomal RNA genes and mitochondrial DNAs. On the contrary, whole metagenomic data obtained by the shotgun sequence method have not often been fully analyzed through a homology search because the genomic data in databases for living organisms on earth are insufficient. In order to complement the results obtained through homology-search-based methods with shotgun metagenomes data, we focused on the composition of protein domains deduced from the sequences of genomes and metagenomes, and we utilized them in characterizing genomes and metagenomes, respectively. First, we compared the relationships based on similarities in the protein domain composition with the relationships based on sequence similarities. We searched for protein domains of 325 bacterial species produced using the Pfam database. Next, the correlation coefficients of protein domain compositions between every pair of bacteria were examined. Every pairwise genetic distance was also calculated from 16S rRNA or DNA gyrase subunit B. We compared the results of these methods and found a moderate correlation between them. Essentially, the same results were obtained when we used partial random 100 bp DNA sequences of the bacterial genomes, which simulated raw sequence data obtained from short-read next-generation sequences. Then, we applied the method for analyzing the actual environmental data obtained by shotgun sequencing. We found that the transition of the microbial phase occurred because the seasonal change in water temperature was shown by the method. These results showed the usability of the method in characterizing metagenomic data based on protein domain compositions.
宏基因组数据主要是通过基于经过充分研究的基因组序列的一小部分(如核糖体RNA基因和线粒体DNA)来展示生物组成。相反,通过鸟枪法测序获得的完整宏基因组数据由于地球上生物数据库中的基因组数据不足,常常没有通过同源性搜索进行充分分析。为了用鸟枪法宏基因组数据补充基于同源性搜索方法获得的结果,我们关注从基因组和宏基因组序列推导的蛋白质结构域组成,并分别将其用于表征基因组和宏基因组。首先,我们比较了基于蛋白质结构域组成相似性的关系与基于序列相似性的关系。我们使用Pfam数据库搜索了325种细菌产生的蛋白质结构域。接下来,检查了每对细菌之间蛋白质结构域组成的相关系数。还从16S rRNA或DNA促旋酶亚基B计算了每对之间的遗传距离。我们比较了这些方法的结果,发现它们之间存在适度的相关性。本质上,当我们使用细菌基因组的部分随机100 bp DNA序列时,得到了相同的结果,这些序列模拟了从短读长下一代序列获得的原始序列数据。然后,我们将该方法应用于分析通过鸟枪法测序获得的实际环境数据。我们发现该方法显示出由于水温的季节性变化导致微生物相的转变。这些结果表明了该方法在基于蛋白质结构域组成表征宏基因组数据方面的可用性。