AcornMed Biotechnology Co., Ltd., Floor 18, Block 5, Yard 18, Kechuang 13 RD, Beijing, 100176, China.
Department of Pathology, The Second Affiliated Hospital of Zhejiang University School of Medicine, No. 88 Jiefang Road, Shangcheng District, Hangzhou, 310009, Zhejiang, China.
BMC Bioinformatics. 2021 Apr 12;22(1):185. doi: 10.1186/s12859-021-03986-z.
Microsatellite instability (MSI) is a common genomic alteration in colorectal cancer, endometrial carcinoma, and other solid tumors. MSI is characterized by a high degree of polymorphism in microsatellite lengths owing to the deficiency in the mismatch repair system. Based on the degree, MSI can be classified as microsatellite instability-high (MSI-H) and microsatellite stable (MSS). MSI is a predictive biomarker for immunotherapy efficacy in advanced/metastatic solid tumors, especially in colorectal cancer patients. Several computational approaches based on target panel sequencing data have been used to detect MSI; however, they are considerably affected by the sequencing depth and panel size.
We developed MSIFinder, a python package for automatic MSI classification, using random forest classifier (RFC)-based genome sequencing, which is a machine learning technology. We included 19 MSI-H and 25 MSS samples as training sets. First, we selected 54 feature markers from the training sets, built an RFC model, and validated the classifier using a test set comprising 21 MSI-H and 379 MSS samples. With this test set, MSIFinder achieved a sensitivity (recall) of 1.0, a specificity of 0.997, an accuracy of 0.998, a positive predictive value of 0.954, an F1 score of 0.977, and an area under the curve of 0.999. To further verify the robustness and effectiveness of the model, we used a prospective cohort consisting of 18 MSI-H samples and 122 MSS samples. MSIFinder achieved a sensitivity (recall) of 1.0 and a specificity of 1.0. We discovered that MSIFinder is less affected by a low sequencing depth and can achieve a concordance of 0.993 while exhibiting a sequencing depth of 100×. Furthermore, we realized that MSIFinder is less affected by the panel size and can achieve a concordance of 0.99 when the panel size is 0.5 M (million bases).
These results indicate that MSIFinder is a robust and effective MSI classification tool that can provide reliable MSI detection for scientific and clinical purposes.
微卫星不稳定性(MSI)是结直肠癌、子宫内膜癌和其他实体瘤中常见的基因组改变。MSI 的特征是由于错配修复系统的缺陷,微卫星长度高度多态性。根据程度,MSI 可分为微卫星不稳定性高(MSI-H)和微卫星稳定(MSS)。MSI 是晚期/转移性实体瘤免疫治疗疗效的预测生物标志物,特别是在结直肠癌患者中。已经使用基于靶向面板测序数据的几种计算方法来检测 MSI;然而,它们受到测序深度和面板大小的显著影响。
我们使用基于随机森林分类器(RFC)的基因组测序开发了 MSIFinder,这是一种机器学习技术,用于自动 MSI 分类的 python 包。我们将 19 个 MSI-H 和 25 个 MSS 样本作为训练集。首先,我们从训练集中选择了 54 个特征标记物,构建了一个 RFC 模型,并使用包含 21 个 MSI-H 和 379 个 MSS 样本的测试集验证了分类器。在这个测试集上,MSIFinder 的灵敏度(召回率)为 1.0,特异性为 0.997,准确性为 0.998,阳性预测值为 0.954,F1 得分为 0.977,曲线下面积为 0.999。为了进一步验证模型的稳健性和有效性,我们使用了一个包含 18 个 MSI-H 样本和 122 个 MSS 样本的前瞻性队列。MSIFinder 的灵敏度(召回率)为 1.0,特异性为 1.0。我们发现 MSIFinder 受测序深度低的影响较小,在测序深度为 100×时可以达到 0.993 的一致性。此外,我们意识到 MSIFinder 受面板大小的影响较小,当面板大小时可以达到 0.99 的一致性0.5 M(百万碱基)。
这些结果表明,MSIFinder 是一种稳健有效的 MSI 分类工具,可为科学和临床目的提供可靠的 MSI 检测。