Department of Head and Neck Surgery, Renji Hospital, School of Medicine, Shanghai Jiaotong University, 160 Pujian Road, Pudong District, Shanghai, 200127, China.
Fun-med Pharmaceutical Technology (Shanghai) Co., Ltd., RM. A310, 115 Xinjunhuan Road, Minhang District, Shanghai, 201100, China.
Endocrine. 2021 Jun;72(3):758-783. doi: 10.1007/s12020-020-02523-x. Epub 2020 Nov 12.
To assess the capacity of support vector machine (SVM) algorithms that are developed based on platelet RNA-seq data in identifying thyroid neoplasm patients and differentiating patients with thyroid adenomas, papillary thyroid cancer and metastasized papillary thyroid cancer.
Platelets were collected and isolated from 109 patients and 63 healthy controls. RNA-seq was performed to find transcripts with differential levels. Genes corresponding to these altered transcripts were identified using R packages. All samples were subsampled into a training set and a validation set. Two SVM algorithms were developed and trained with the training set, using the genes with differential transcript levels (GDTLs) as classifiers, and validated with the validation set. GO and KEGG pathway enrichment analysis were performed using the R package clusterProfiler.
We detected 765 GDTLs (442 up-regulated and 323 down-regulated) in platelets of patients and healthy controls. The algorithm identifying thyroid neoplasm patients achieved an accuracy of 97%, with an AUC (area under curve) of 0.998. The other algorithm differentiating patients with multiclass thyroid neoplasms had an average accuracy of 80.5%. GO analysis showed that GDTLs were strongly involved in biological processes such as neutrophil degranulation, neutrophil activation, autophagy and regulation of multi-organism process. KEGG pathway enrichment analysis revealed that GDTLs were mainly enriched in NOD-like receptor signaling pathway and pathways in endocytosis, osteoclast differentiation, human cytomegalovirus infection and tuberculosis.
Our results indicated that the combination of SVM algorithms and platelet RNA-seq data allowed for thyroid neoplasm diagnostics and multiclass thyroid neoplasm classification.
评估基于血小板 RNA-seq 数据开发的支持向量机(SVM)算法在识别甲状腺肿瘤患者和区分甲状腺腺瘤、甲状腺乳头状癌和转移性甲状腺乳头状癌患者方面的能力。
收集并分离 109 名患者和 63 名健康对照者的血小板。进行 RNA-seq 以寻找具有差异水平的转录物。使用 R 包识别对应于这些改变的转录物的基因。所有样本均被亚采样到训练集和验证集中。使用差异转录水平的基因(GDTLs)作为分类器,使用训练集开发并训练两个 SVM 算法,并使用验证集进行验证。使用 R 包 clusterProfiler 进行 GO 和 KEGG 通路富集分析。
我们在患者和健康对照者的血小板中检测到 765 个 GDTLs(442 个上调和 323 个下调)。用于识别甲状腺肿瘤患者的算法的准确率为 97%,AUC(曲线下面积)为 0.998。另一个用于区分多类甲状腺肿瘤患者的算法的平均准确率为 80.5%。GO 分析表明,GDTLs 强烈参与生物过程,如中性粒细胞脱粒、中性粒细胞激活、自噬和多器官过程的调节。KEGG 通路富集分析表明,GDTLs 主要富集于 NOD 样受体信号通路以及内吞作用、破骨细胞分化、人巨细胞病毒感染和结核病等通路。
我们的结果表明,SVM 算法和血小板 RNA-seq 数据的结合可用于甲状腺肿瘤的诊断和多类甲状腺肿瘤的分类。