Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, The University of Newcastle, Callaghan, Australia.
PLoS One. 2011 Mar 24;6(3):e17481. doi: 10.1371/journal.pone.0017481.
In November 2007 a study published in Nature Medicine proposed a simple test based on the abundance of 18 proteins in blood to predict the onset of clinical symptoms of Alzheimer's Disease (AD) two to six years before these symptoms manifest. Later, another study, published in PLoS ONE, showed that only five proteins (IL-1, IL-3, EGF, TNF- and G-CSF) have overall better prediction accuracy. These classifiers are based on the abundance of 120 proteins. Such values were standardised by a Z-score transformation, which means that their values are relative to the average of all others.
The original datasets from the Nature Medicine paper are further studied using methods from combinatorial optimisation and Information Theory. We expand the original dataset by also including all pair-wise differences of z-score values of the original dataset ("metafeatures"). Using an exact algorithm to solve the resulting Feature Set problem, used to tackle the feature selection problem, we found signatures that contain either only features, metafeatures or both, and evaluated their predictive performance on the independent test set.
It was possible to show that a specific pattern of cell signalling imbalance in blood plasma has valuable information to distinguish between NDC and AD samples. The obtained signatures were able to predict AD in patients that already had a Mild Cognitive Impairment (MCI) with up to 84% of sensitivity, while maintaining also a strong prediction accuracy of 90% on a independent dataset with Non Demented Controls (NDC) and AD samples. The novel biomarkers uncovered with this method now confirms ANG-2, IL-11, PDGF-BB, CCL15/MIP-1; and supports the joint measurement of other signalling proteins not previously discussed: GM-CSF, NT-3, IGFBP-2 and VEGF-B.
2007 年 11 月,《自然医学》杂志发表的一项研究提出了一种基于血液中 18 种蛋白质丰度的简单测试方法,可在出现阿尔茨海默病(AD)临床症状前 2 至 6 年预测其发病。后来,另一项发表在《公共科学图书馆·综合》杂志上的研究表明,只有 5 种蛋白质(IL-1、IL-3、EGF、TNF-α 和 G-CSF)具有整体更好的预测准确性。这些分类器基于 120 种蛋白质的丰度。这些值通过 Z 分数变换进行标准化,这意味着它们的值相对于所有其他值的平均值。
进一步使用组合优化和信息论方法研究来自《自然医学》论文的原始数据集。我们通过还包括原始数据集的 Z 分数值的所有两两差异(元特征)来扩展原始数据集。使用精确算法来解决由此产生的特征集问题,用于解决特征选择问题,我们找到了仅包含特征、元特征或两者的特征签名,并在独立测试集上评估其预测性能。
证明了血液中特定的细胞信号失衡模式具有区分 NDC 和 AD 样本的有价值信息。获得的特征签名能够以高达 84%的灵敏度预测已经患有轻度认知障碍(MCI)的 AD 患者,同时在具有非痴呆对照(NDC)和 AD 样本的独立数据集上保持 90%的强预测准确性。该方法发现的新型生物标志物现在证实了 ANG-2、IL-11、PDGF-BB、CCL15/MIP-1;并支持以前未讨论的其他信号蛋白的联合测量:GM-CSF、NT-3、IGFBP-2 和 VEGF-B。