SATBBI (South African Tuberculosis Bioinformatics Initiative), Centre for Bioinformatics and Computational Biology, Stellenbosch University, Cape Town, 7505, South Africa.
DST-NRF Centre of Excellence for Biomedical Tuberculosis Research, Cape Town, 7505, South Africa.
Proteomics. 2020 Nov;20(21-22):e1900382. doi: 10.1002/pmic.201900382. Epub 2020 Aug 23.
The increasing amount of publicly available proteomics data creates opportunities for data scientists to investigate quality metrics in novel ways. QuaMeter IDFree is used to generate quality metrics from 665 RAW files and 97 WIFF files representing publicly available "shotgun" mass spectrometry datasets. These experiments are selected to represent Mycobacterium tuberculosis lysates, mouse MDSCs, and exosomes derived from human cell lines. Machine learning techniques are demonstrated to detect outliers within experiments and it is shown that quality metrics may be used to distinguish sources of variability among these experiments. In particular, the findings demonstrate that according to nested ANOVA performed on an SDS-PAGE shotgun principal component analysis, runs of fractions from the same gel regions cluster together rather than technical replicates, close temporal proximity, or even biological samples. This indicates that the individual fraction may have had a higher impact on the quality metrics than other factors. In addition, sample type, instrument type, mass analyzer, fragmentation technique, and digestion enzyme are identified as sources of variability. From a quality control perspective, the importance of study design and in particular, the run order, is illustrated in seeking ways to limit the impact of technical variability.
越来越多的公开可用蛋白质组学数据为数据科学家提供了以新方式研究质量指标的机会。QuaMeter IDFree 用于从 665 个 RAW 文件和 97 个 WIFF 文件生成质量指标,这些文件代表公开的“shotgun”质谱数据集。这些实验是为了代表结核分枝杆菌裂解物、小鼠 MDSC 和源自人细胞系的外泌体而选择的。研究表明,机器学习技术可用于检测实验中的异常值,并且可以使用质量指标来区分这些实验之间的变异性来源。具体来说,根据 SDS-PAGE shotgun 主成分分析上进行的嵌套 ANOVA 分析,来自同一凝胶区域的馏分的运行聚类在一起,而不是技术重复、接近的时间接近度,甚至是生物样本。这表明单个馏分可能对质量指标的影响比其他因素更大。此外,还确定了样本类型、仪器类型、质量分析器、碎片化技术和消化酶是变异性的来源。从质量控制的角度来看,研究设计,特别是运行顺序的重要性在寻求限制技术变异性影响的方法中得到了说明。