Yamamoto Ryo, Liu Zhiheng, Choudhury Mudra, Xiao Xinshu
Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, USA.
Department of Integrative Biology and Physiology, University of California, Los Angeles, California, USA.
bioRxiv. 2023 Jun 7:2023.06.02.543466. doi: 10.1101/2023.06.02.543466.
Double-stranded RNAs (dsRNAs) are potent triggers of innate immune responses upon recognition by cytosolic dsRNA sensor proteins. Identification of endogenous dsRNAs helps to better understand the dsRNAome and its relevance to innate immunity related to human diseases. Here, we report dsRID (double-stranded RNA identifier), a machine learning-based method to predict dsRNA regions , leveraging the power of long-read RNA-sequencing (RNA-seq) and molecular traits of dsRNAs. Using models trained with PacBio long-read RNA-seq data derived from Alzheimer's disease (AD) brain, we show that our approach is highly accurate in predicting dsRNA regions in multiple datasets. Applied to an AD cohort sequenced by the ENCODE consortium, we characterize the global dsRNA profile with potentially distinct expression patterns between AD and controls. Together, we show that dsRID provides an effective approach to capture global dsRNA profiles using long-read RNA-seq data.
双链RNA(dsRNA)在被胞质双链RNA传感蛋白识别后,是先天免疫反应的有效触发因素。内源性dsRNA的鉴定有助于更好地理解双链RNA组及其与人类疾病相关的先天免疫的相关性。在此,我们报告了dsRID(双链RNA标识符),这是一种基于机器学习的方法,用于预测dsRNA区域,利用了长读长RNA测序(RNA-seq)的力量和dsRNA的分子特征。使用从阿尔茨海默病(AD)大脑中获得的PacBio长读长RNA-seq数据训练的模型,我们表明我们的方法在预测多个数据集中的dsRNA区域方面非常准确。应用于由ENCODE联盟测序的AD队列,我们描绘了AD和对照之间具有潜在不同表达模式的全局dsRNA图谱。总之,我们表明dsRID提供了一种使用长读长RNA-seq数据捕获全局dsRNA图谱的有效方法。