Université Côte d'Azur, Center of Modeling, Simulation and Interactions, Nice 06000, France.
Université Côte d'Azur, Inserm U1081, CNRS UMR 7284, Institute for Research on Cancer and Aging, Nice (IRCAN), Centre Hospitalier Universitaire (CHU) de Nice, Nice 06200, France.
Bioinformatics. 2022 Oct 14;38(20):4754-4761. doi: 10.1093/bioinformatics/btac603.
Current advances in omics technologies are paving the diagnosis of rare diseases proposing a complementary assay to identify the responsible gene. The use of transcriptomic data to identify aberrant gene expression (AGE) has demonstrated to yield potential pathogenic events. However, popular approaches for AGE identification are limited by the use of statistical tests that imply the choice of arbitrary cut-off for significance assessment and the availability of several replicates not always possible in clinical contexts.
Hence, we developed ABerrant Expression Identification empLoying machine LEarning from sequencing data (ABEILLE) a variational autoencoder (VAE)-based method for the identification of AGEs from the analysis of RNA-seq data without the need for replicates or a control group. ABEILLE combines the use of a VAE, able to model any data without specific assumptions on their distribution, and a decision tree to classify genes as AGE or non-AGE. An anomaly score is associated with each gene in order to stratify AGE by the severity of aberration. We tested ABEILLE on a semi-synthetic and an experimental dataset demonstrating the importance of the flexibility of the VAE configuration to identify potential pathogenic candidates.
ABEILLE source code is freely available at: https://github.com/UCA-MSI/ABEILLE.
Supplementary data are available at Bioinformatics online.
目前组学技术的进步为罕见病的诊断铺平了道路,提出了一种补充检测方法来鉴定致病基因。利用转录组数据来识别异常基因表达(AGE)已经证明可以产生潜在的致病事件。然而,AGE 识别的常用方法受到统计检验的限制,这些检验需要选择任意的显著水平截断值,并且在临床环境中并不总是能够获得多个重复样本。
因此,我们开发了一种基于变分自动编码器(VAE)的方法,称为使用测序数据进行异常表达识别的 ABerrant Expression identification empLoying machine LEarning(ABEILLE),用于从 RNA-seq 数据的分析中识别 AGE,而无需重复样本或对照组。ABEILLE 结合了 VAE 的使用,VAE 能够在没有关于其分布的特定假设的情况下对任何数据进行建模,以及决策树来对基因进行分类为 AGE 或非 AGE。为了根据异常程度对 AGE 进行分层,为每个基因分配了一个异常得分。我们在一个半合成和一个实验数据集上测试了 ABEILLE,证明了 VAE 配置的灵活性对于识别潜在的致病候选物非常重要。
ABEILLE 的源代码可在 https://github.com/UCA-MSI/ABEILLE 上免费获得。
补充数据可在生物信息学在线获得。