Odense Amyloidosis Center, Odense University Hospital, 5000 Odense, Denmark.
Centre for Clinical Proteomics, Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, 5000 Odense, Denmark.
Int J Mol Sci. 2021 Dec 28;23(1):319. doi: 10.3390/ijms23010319.
Amyloidosis is a rare disease caused by the misfolding and extracellular aggregation of proteins as insoluble fibrillary deposits localized either in specific organs or systemically throughout the body. The organ targeted and the disease progression and outcome is highly dependent on the specific fibril-forming protein, and its accurate identification is essential to the choice of treatment. Mass spectrometry-based proteomics has become the method of choice for the identification of the amyloidogenic protein. Regrettably, this identification relies on manual and subjective interpretation of mass spectrometry data by an expert, which is undesirable and may bias diagnosis. To circumvent this, we developed a statistical model-assisted method for the unbiased identification of amyloid-containing biopsies and amyloidosis subtyping. Based on data from mass spectrometric analysis of amyloid-containing biopsies and corresponding controls. A Boruta method applied on a random forest classifier was applied to proteomics data obtained from the mass spectrometric analysis of 75 laser dissected Congo Red positive amyloid-containing biopsies and 78 Congo Red negative biopsies to identify novel "amyloid signature" proteins that included clusterin, fibulin-1, vitronectin complement component C9 and also three collagen proteins, as well as the well-known amyloid signature proteins apolipoprotein E, apolipoprotein A4, and serum amyloid P. A SVM learning algorithm were trained on the mass spectrometry data from the analysis of the 75 amyloid-containing biopsies and 78 amyloid-negative control biopsies. The trained algorithm performed superior in the discrimination of amyloid-containing biopsies from controls, with an accuracy of 1.0 when applied to a blinded mass spectrometry validation data set of 103 prospectively collected amyloid-containing biopsies. Moreover, our method successfully classified amyloidosis patients according to the subtype in 102 out of 103 blinded cases. Collectively, our model-assisted approach identified novel amyloid-associated proteins and demonstrated the use of mass spectrometry-based data in clinical diagnostics of disease by the unbiased and reliable model-assisted classification of amyloid deposits and of the specific amyloid subtype.
淀粉样变性是一种由蛋白质错误折叠和细胞外聚集引起的罕见疾病,这些蛋白质以不溶性纤维状沉积物的形式存在于特定器官或全身。靶器官以及疾病的进展和结果高度依赖于特定的纤维状形成蛋白,其准确鉴定对于治疗选择至关重要。基于质谱的蛋白质组学已成为鉴定淀粉样变性蛋白的首选方法。遗憾的是,这种鉴定依赖于专家对质谱数据的手动和主观解释,这是不理想的,可能会导致诊断偏差。为了避免这种情况,我们开发了一种基于统计模型的方法,用于对含有淀粉样物质的活检进行无偏鉴定和淀粉样变性亚型分类。该方法基于对 75 例激光切割刚果红阳性含有淀粉样物质的活检组织和 78 例刚果红阴性活检组织的质谱分析数据。该方法采用 Boruta 方法对随机森林分类器进行了应用,对从 75 例激光切割刚果红阳性含有淀粉样物质的活检组织和 78 例刚果红阴性活检组织的质谱分析中获得的蛋白质组学数据进行了分析,以鉴定新的“淀粉样蛋白特征”蛋白,包括载脂蛋白 E、载脂蛋白 A4、血清淀粉样蛋白 P 等已知的淀粉样蛋白特征蛋白,以及簇蛋白、纤维蛋白 1、纤连蛋白和补体成分 C9。基于从 75 例含有淀粉样物质的活检组织和 78 例淀粉样阴性对照活检组织的质谱分析数据,对 SVM 学习算法进行了训练。该训练算法在区分淀粉样物质含有活检组织和对照组织方面表现出色,当应用于 103 例前瞻性收集的含有淀粉样物质的活检组织的盲法质谱验证数据集时,其准确性达到 1.0。此外,我们的方法还成功地根据亚型对 103 例盲法病例中的 102 例淀粉样变性患者进行了分类。总之,我们的模型辅助方法鉴定了新的淀粉样相关蛋白,并通过基于质谱的数据分析的无偏、可靠的模型辅助分类,展示了其在疾病临床诊断中的应用,包括淀粉样沉积物和特定淀粉样亚型的分类。