Chen Hao, Poon Josiah, Poon Simon K, Cui Lizhi, Fan Kei, Sze Daniel
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S4. doi: 10.1186/1471-2105-16-S12-S4. Epub 2015 Aug 25.
Recent quality control of complex mixtures, including herbal medicines, is not limited to chemical chromatographic definition of one or two selected compounds; multivariate linear regression methods with dimension reduction or regularisation have been used to predict the bioactivity capacity from the chromatographic fingerprints of the herbal extracts. The challenge of this type of analysis requires a multi-dimensional approach at two levels: firstly each herb comprises complex mixtures of active and non-active chemical components; and secondly there are many factors relating to the growth, production, and processing of the herbal products. All these factors result in the significantly diverse concentrations of bioactive compounds in the herbal products. Therefore, it is imminent to have a predictive model with better generalisation that can accurately predict the bioactivity capacity of samples when only the chemical fingerprints data are available.
In this study, the algorithm of Stacking Multivariate Linear Regression (SMLR) and a few other commonly used chemometric approaches were evaluated. They were to predict the Cluster of Differentiation 80 (CD80) expression bioactivity of a commonly used herb, Astragali Radix (AR), from the corresponding chemical chromatographic fingerprints. SMLR provides a superior prediction accuracy in comparison with the other multivariate linear regression methods of PCR, PLSR, OPLS and EN in terms of MSEtest and the goodness of prediction of test samples.
SMLR is a better platform than some multivariate linear regression methods. The first advantage of SMLR is that it has better generalisation to predict the bioactivity capacity of herbal medicines from their chromatographic fingerprints. Future studies should aim to further improve the SMLR algorithm. The second advantage of SMLR is that single chemical compounds can be effectively identified as highly bioactive components which demands further CD80 bioactivity confirmation..
近期对包括草药在内的复杂混合物的质量控制,并不局限于对一两种选定化合物进行化学色谱定义;具有降维或正则化功能的多元线性回归方法已被用于从草药提取物的色谱指纹图谱预测生物活性能力。这类分析面临的挑战需要在两个层面采用多维度方法:首先,每种草药都包含活性和非活性化学成分的复杂混合物;其次,与草药产品的生长、生产和加工相关的因素众多。所有这些因素导致草药产品中生物活性化合物的浓度存在显著差异。因此,迫切需要一个具有更好泛化能力的预测模型,当仅有化学指纹数据时,该模型能够准确预测样品的生物活性能力。
在本研究中,对堆叠多元线性回归(SMLR)算法和其他一些常用的化学计量学方法进行了评估。它们旨在从相应的化学色谱指纹图谱预测常用草药黄芪(AR)的分化簇80(CD80)表达生物活性。就测试样本的均方误差(MSEtest)和预测优度而言,与PCR、PLSR、OPLS和EN等其他多元线性回归方法相比,SMLR具有更高的预测准确性。
SMLR是一个比某些多元线性回归方法更好的平台。SMLR的第一个优点是,它在从色谱指纹图谱预测草药生物活性能力方面具有更好的泛化能力。未来的研究应致力于进一步改进SMLR算法。SMLR的第二个优点是,可以有效地将单一化学成分识别为高生物活性成分,这需要进一步进行CD80生物活性确认。