Kong Jinhwa, Won Jungim, Yoon Jeehee, Lee UnJoo, Kim Jong-Il, Huh Sun
Department of Computer Engineering, College of Engineering, Hallym University, Chuncheon 24252, Korea.
Smart Computing Lab., Hallym University, Chuncheon 24252, Korea.
Korean J Parasitol. 2016 Dec;54(6):751-758. doi: 10.3347/kjp.2016.54.6.751. Epub 2016 Dec 31.
This study aimed at constructing a draft genome of the adult female worm using next-generation sequencing (NGS) and de novo assembly, as well as to find new genes after annotation using functional genomics tools. Using an NGS machine, we produced DNA read data of . The de novo assembly of the read data was performed using SOAPdenovo. RNA read data were assembled using Trinity. Structural annotation, homology search, functional annotation, classification of protein domains, and KEGG pathway analysis were carried out. Besides them, recently developed tools such as MAKER, PASA, Evidence Modeler, and Blast2GO were used. The scaffold DNA was obtained, the N50 was 108,950 bp, and the overall length was 341,776,187 bp. The N50 of the transcriptome was 940 bp, and its length was 53,046,952 bp. The GC content of the entire genome was 39.3%. The total number of genes was 20,178, and the total number of protein sequences was 22,358. Of the 22,358 protein sequences, 4,992 were newly observed in . Following proteins previously unknown were found: E3 ubiquitin-protein ligase cbl-b and antigen T-cell receptor, zeta chain for T-cell and B-cell regulation; endoprotease bli-4 for cuticle metabolism; mucin 12Ea and polymorphic mucin variant C6/1/40r2.1 for mucin production; tropomodulin-family protein and ryanodine receptor calcium release channels for muscle movement. We were able to find new hypothetical polypeptides sequences unique to , and the findings of this study are capable of serving as a basis for extending our biological understanding of .
本研究旨在利用下一代测序(NGS)和从头组装构建成年雌虫的基因组草图,并使用功能基因组学工具在注释后寻找新基因。我们使用一台NGS机器生成了……的DNA读取数据。读取数据的从头组装使用SOAPdenovo进行。RNA读取数据使用Trinity进行组装。进行了结构注释、同源性搜索、功能注释、蛋白质结构域分类和KEGG通路分析。除此之外,还使用了最近开发的工具,如MAKER、PASA、Evidence Modeler和Blast2GO。获得了支架DNA,N50为108,950 bp,总长度为341,776,187 bp。转录组的N50为940 bp,长度为53,046,952 bp。整个基因组的GC含量为39.3%。基因总数为20,178个,蛋白质序列总数为22,358个。在这22,358个蛋白质序列中,有4,992个是在……中首次观察到的。发现了以下先前未知的蛋白质:用于T细胞和B细胞调节的E3泛素蛋白连接酶cbl-b和抗原T细胞受体ζ链;用于表皮代谢的内蛋白酶bli-4;用于粘蛋白产生的粘蛋白12Ea和多态性粘蛋白变体C6/1/40r2.1;用于肌肉运动的原肌球蛋白家族蛋白和兰尼碱受体钙释放通道。我们能够找到……特有的新的假设多肽序列,本研究的结果能够作为扩展我们对……生物学理解的基础。