Kamath Padmaja, Fernandez Alberto, Giralt Francesc, Rallo Robert
Departament d'Enginyeria Informatica i Matematiques, Escola Tecnica Superior d'Enginyeria, Universitat Rovira i Virgili, Av. Paisos Catalans 26, 43007 Tarragona, Spain.
Curr Top Med Chem. 2015;15(18):1930-7. doi: 10.2174/1568026615666150506152808.
Nanoparticles are likely to interact in real-case application scenarios with mixtures of proteins and biomolecules that will absorb onto their surface forming the so-called protein corona. Information related to the composition of the protein corona and net cell association was collected from literature for a library of surface-modified gold and silver nanoparticles. For each protein in the corona, sequence information was extracted and used to calculate physicochemical properties and statistical descriptors. Data cleaning and preprocessing techniques including statistical analysis and feature selection methods were applied to remove highly correlated, redundant and non-significant features. A weighting technique was applied to construct specific signatures that represent the corona composition for each nanoparticle. Using this basic set of protein descriptors, a new Protein Corona Structure-Activity Relationship (PCSAR) that relates net cell association with the physicochemical descriptors of the proteins that form the corona was developed and validated. The features that resulted from the feature selection were in line with already published literature, and the computational model constructed on these features had a good accuracy (R(2)LOO=0.76 and R(2)LMO(25%)=0.72) and stability, with the advantage that the fingerprints based on physicochemical descriptors were independent of the specific proteins that form the corona.
在实际应用场景中,纳米颗粒可能会与蛋白质和生物分子的混合物相互作用,这些蛋白质和生物分子会吸附在纳米颗粒表面,形成所谓的蛋白质冠层。从文献中收集了有关表面改性金和银纳米颗粒库的蛋白质冠层组成和细胞净结合的信息。对于冠层中的每种蛋白质,提取序列信息并用于计算物理化学性质和统计描述符。应用包括统计分析和特征选择方法在内的数据清理和预处理技术,以去除高度相关、冗余和无意义的特征。应用加权技术构建代表每个纳米颗粒冠层组成的特定特征。使用这组基本的蛋白质描述符,开发并验证了一种新的蛋白质冠层构效关系(PCSAR),该关系将细胞净结合与形成冠层的蛋白质的物理化学描述符联系起来。特征选择产生的特征与已发表的文献一致,基于这些特征构建的计算模型具有良好的准确性(留一法交叉验证R(2)=0.76,25%随机排列测试集R(2)=0.72)和稳定性,其优点是基于物理化学描述符的指纹与形成冠层的特定蛋白质无关。