CAS Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, 666303, Yunnan, China.
State Key Laboratory of Grassland Agro-Ecosystem, College of Life Sciences, Lanzhou University, Lanzhou, China.
BMC Bioinformatics. 2019 Feb 13;20(1):75. doi: 10.1186/s12859-019-2670-3.
With the availability of well-assembled genomes of a growing number of organisms, identifying the bioinformatic basis of whole genome duplication (WGD) is a growing field of genomics. The most extant software for detecting footprints of WGDs has been restricted to a well-assembled genome. However, the massive poor quality genomes and the more accessible transcriptomes have been largely ignored, and in theoretically they are also likely to contribute to detect WGD using dS based method. Here, to resolve these problems, we have designed a universal and simple technical tool WGDdetector for detecting WGDs using either genome or transcriptome annotations in different organisms based on the widely used dS based method.
We have constructed WGDdetector pipeline that integrates all analyses including gene family constructing, dS estimating and phasing, and outputting the dS values of each paralogs pairs processed with only one command. We further chose four species (Arabidopsis thaliana, Juglans regia, Populus trichocarpa and Xenopus laevis) representing herb, wood and animal, to test its practicability. Our final results showed a high degree of accuracy with the previous studies using both genome and transcriptome data.
WGDdetector is not only reliable and stable for genome data, but also a new way to using the transcriptome data to obtain the correct dS distribution for detecting WGD. The source code is freely available, and is implemented in Windows and Linux operation system.
随着越来越多生物体的基因组组装变得可行,识别全基因组复制(WGD)的生物信息学基础是基因组学日益增长的领域。最先进的用于检测 WGD 足迹的软件一直局限于组装良好的基因组。然而,大量质量较差的基因组和更容易获得的转录组在很大程度上被忽视了,从理论上讲,它们也可能有助于使用基于 dS 的方法检测 WGD。在这里,为了解决这些问题,我们设计了一个通用且简单的技术工具 WGDdetector,用于根据广泛使用的基于 dS 的方法,使用不同生物体的基因组或转录组注释来检测 WGD。
我们构建了 WGDdetector 流水线,该流水线集成了所有分析,包括基因家族构建、dS 估计和分相,并通过仅一条命令处理每个同源基因对的 dS 值。我们进一步选择了四个物种(拟南芥、核桃、杨树和非洲爪蟾),分别代表草本、木本和动物,以测试其实用性。我们的最终结果显示,使用基因组和转录组数据的先前研究具有高度准确性。
WGDdetector 不仅对基因组数据可靠且稳定,而且是一种使用转录组数据获得正确 dS 分布以检测 WGD 的新方法。源代码是免费的,并在 Windows 和 Linux 操作系统中实现。