Sorbonne Universités, UPMC Univ Paris 06, Univ Antilles Guyane, Univ Nice Sophia Antipolis, CNRS, Evolution Paris Seine - Institut de Biologie Paris Seine (EPS - IBPS), Paris, France.
CNRS, UPMC, FR2424, ABiMS, Station Biologique, Roscoff, France.
Mol Ecol. 2018 May;27(10):2365-2380. doi: 10.1111/mec.14579. Epub 2018 May 3.
Dinoflagellates are one of the most abundant and functionally diverse groups of eukaryotes. Despite an overall scarcity of genomic information for dinoflagellates, constantly emerging high-throughput sequencing resources can be used to characterize and compare these organisms. We assembled de novo and processed 46 dinoflagellate transcriptomes and used a sequence similarity network (SSN) to compare the underlying genomic basis of functional features within the group. This approach constitutes the most comprehensive picture to date of the genomic potential of dinoflagellates. A core-predicted proteome composed of 252 connected components (CCs) of putative conserved protein domains (pCDs) was identified. Of these, 206 were novel and 16 lacked any functional annotation in public databases. Integration of functional information in our network analyses allowed investigation of pCDs specifically associated with functional traits. With respect to toxicity, sequences homologous to those of proteins found in species with toxicity potential (e.g., sxtA4 and sxtG) were not specific to known toxin-producing species. Although not fully specific to symbiosis, the most represented functions associated with proteins involved in the symbiotic trait were related to membrane processes and ion transport. Overall, our SSN approach led to identification of 45,207 and 90,794 specific and constitutive pCDs of, respectively, the toxic and symbiotic species represented in our analyses. Of these, 56% and 57%, respectively (i.e., 25,393 and 52,193 pCDs), completely lacked annotation in public databases. This stresses the extent of our lack of knowledge, while emphasizing the potential of SSNs to identify candidate pCDs for further functional genomic characterization.
甲藻是真核生物中最丰富和功能最多样化的类群之一。尽管甲藻的基因组信息总体匮乏,但不断涌现的高通量测序资源可用于对这些生物进行特征描述和比较。我们从头组装并处理了 46 个甲藻转录组,并使用序列相似性网络(SSN)来比较该组内功能特征的潜在基因组基础。这种方法构成了迄今为止对甲藻基因组潜力最全面的描述。确定了一个由 252 个假定保守蛋白域(pCD)的连接组件(CC)组成的核心预测蛋白质组。其中,206 个是新的,16 个在公共数据库中缺乏任何功能注释。我们的网络分析中整合了功能信息,从而可以专门研究与功能特征相关的 pCD。关于毒性,与具有毒性潜力的物种(例如 sxtA4 和 sxtG)中发现的蛋白质同源的序列并非特定于已知的产毒物种。尽管与共生不完全特异,但与参与共生特征的蛋白质相关的最具代表性的功能与膜过程和离子运输有关。总的来说,我们的 SSN 方法分别鉴定了我们分析中代表的毒性和共生物种的 45,207 个和 90,794 个特异性和组成性 pCD。其中,分别有 56%和 57%(即 25,393 个和 52,193 个 pCD)完全缺乏公共数据库中的注释。这强调了我们知识匮乏的程度,同时强调了 SSN 识别候选 pCD 以进行进一步功能基因组特征描述的潜力。