Huazhong University of Science and Technology, China.
Brief Bioinform. 2018 Nov 27;19(6):1325-1336. doi: 10.1093/bib/bbx074.
Different tissues and diseases have distinct transcriptional profilings with specifically expressed genes (SEGs). So, the identification of SEGs is an important issue in the studies of gene function, biological development, disease mechanism and biomarker discovery. However, few accurate and easy-to-use tools are available for RNA sequencing (RNA-seq) data to detect SEGs. Here, we presented SEGtool, a tool based on fuzzy c-means, Jaccard index and greedy annealing method for SEG detection automatically and self-adaptively ignoring data distribution. Testing result showed that our SEGtool outperforms the existing tools, which was mainly developed for microarray data. By applying SEGtool to Genotype-Tissue Expression (GTEx) human tissue data set, we detected 3181 SEGs with tissue-related functions. Regulatory networks reveal tissue-specific transcription factors regulating many SEGs, such as ETV2 in testis, HNF4A in liver and NEUROD1 in brain. Applied to a case study of single-cell sequencing (SCS) data from embryo cells, we identified many SEGs in specific stages of human embryogenesis. Notably, SEGtool is suitable for RNA-seq data and even SCS data with high specificity and accuracy. An implementation of SEGtool R package is freely available at http://bioinfo.life.hust.edu.cn/SEGtool/.
不同的组织和疾病具有特定表达基因 (SEGs) 的独特转录谱。因此,SEGs 的鉴定是研究基因功能、生物发育、疾病机制和生物标志物发现的重要问题。然而,用于 RNA 测序 (RNA-seq) 数据检测 SEGs 的准确且易于使用的工具很少。在这里,我们提出了 SEGtool,这是一种基于模糊 c-均值、Jaccard 指数和贪婪退火方法的工具,用于自动和自适应地忽略数据分布来检测 SEGs。测试结果表明,我们的 SEGtool 优于主要为微阵列数据开发的现有工具。通过将 SEGtool 应用于基因型组织表达 (GTEx) 人类组织数据集,我们检测到 3181 个具有组织相关功能的 SEGs。调控网络揭示了调节许多 SEGs 的组织特异性转录因子,例如睾丸中的 ETV2、肝脏中的 HNF4A 和大脑中的 NEUROD1。将其应用于胚胎细胞的单细胞测序 (SCS) 数据的案例研究中,我们鉴定了人类胚胎发生特定阶段的许多 SEGs。值得注意的是,SEGtool 适用于具有高特异性和准确性的 RNA-seq 数据甚至 SCS 数据。SEGtool 的 R 包实现可在 http://bioinfo.life.hust.edu.cn/SEGtool/ 免费获得。