Department of Chemistry, University of Connecticut, Storrs, Connecticut 06269, United States.
Department of Chemistry, University at Albany (SUNY), Albany, New York 12222, United States.
Anal Chem. 2021 Jun 8;93(22):7860-7869. doi: 10.1021/acs.analchem.1c00359. Epub 2021 May 27.
We propose a novel approach for building a classification/identification framework based on the full complement of RNA post-transcriptional modifications (rPTMs) expressed by an organism at basal conditions. The approach relies on advanced mass spectrometry techniques to characterize the products of exonuclease digestion of total RNA extracts. Sample profiles comprising identities and relative abundances of all detected rPTM were used to train and test the capabilities of different machine learning (ML) algorithms. Each algorithm proved capable of identifying rigorous decision rules for differentiating closely related classes and correctly assigning unlabeled samples. The ML classifiers resolved different members of the family, alternative serotypes, a series of knockout mutants, and primary cells of the central nervous system, which shared very similar genetic backgrounds. The excellent levels of accuracy and resolving power achieved by training on a limited number of classes were successfully replicated when the number of classes was significantly increased to escalate complexity. A dendrogram generated from ML-curated data exhibited a hierarchical organization that closely resembled those afforded by established taxonomic systems. Finer clustering patterns revealed the extensive effects induced by the deletion of a single pivotal gene. This information provided a putative roadmap for exploring the roles of rPTMs in their respective regulatory networks, which will be essential to decipher the epitranscriptomics code. The ubiquitous presence of RNA in virtually all living organisms promises to enable the broadest possible range of applications, with significant implications in the diagnosis of RNA-related diseases.
我们提出了一种新的方法,用于基于生物体在基础条件下表达的完整 RNA 转录后修饰 (rPTM) 构建分类/识别框架。该方法依赖于先进的质谱技术来表征外切核酸酶消化总 RNA 提取物的产物。使用包含所有检测到的 rPTM 的身份和相对丰度的样本谱来训练和测试不同机器学习 (ML) 算法的能力。每个算法都证明能够为区分密切相关的类别的严格决策规则和正确分配未标记的样本提供能力。ML 分类器可区分 家族的不同成员、替代 血清型、一系列 敲除突变体和中枢神经系统的原代细胞,它们具有非常相似的遗传背景。当训练的类别数量显著增加以增加复杂性时,在有限数量的类别上进行训练所达到的出色准确性和分辨率水平得到了成功复制。从 ML 策划的数据生成的聚类树状图显示出与已建立的分类系统提供的聚类树状图非常相似的层次结构组织。更精细的聚类模式揭示了单个关键基因缺失所引起的广泛影响。这些信息为探索 rPTM 在其各自调控网络中的作用提供了一个可能的路线图,这对于解析表观转录组学密码至关重要。RNA 几乎存在于所有生物体中,这一普遍存在的现象有望实现最广泛的应用,这对 RNA 相关疾病的诊断具有重要意义。