Sun Yiwen, Du Pengju, Lu Xingxing, Xie Pengfei, Qian Zhengfang, Fan Shuting, Zhu Zexuan
National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong, Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Department of Biomedical, Engineering, School of Medicine, Shenzhen University, Shenzhen 518060, China.
College of Electronic Science and Technology, Shenzhen University, Shenzhen 518060, China.
Biomed Opt Express. 2018 Jun 6;9(7):2917-2929. doi: 10.1364/BOE.9.002917. eCollection 2018 Jul 1.
The development of new spectral analysis methods in bio thin-film detection has generated intense interest in terahertz (THz) spectroscopy and its application in a wide range of fields. In this paper, it is the first time that machine learning methods are applied to the quantitative characterization of bovine serum albumin (BSA) deposited thin-films detected by terahertz time-domain spectroscopy. The spectra data of BSA thin-films prepared by solutions with concentrations ranging from 0.5 to 35 mg/ml are analyzed using the support vector regression method to learn the underlying model of the frequency against the target concentration. The learned mode successfully predicts the concentrations of the unknown test samples with a coefficient of determination R = 0.97932. Furthermore, aiming to identify the relevance of each frequency to the concentration, the maximal information coefficient statistical analysis is used and the three most discriminating frequencies in THz frequency are identified at 1.2, 1.1 and 0.5 THz respectively, which means a good prediction for BSA concentration can be achieved by using the top three relevant frequencies. Moreover, the top discriminating frequencies are in good agreement with the frequencies predicted by a long-wavelength elastic vibration model for BSA protein.
生物薄膜检测中新光谱分析方法的发展引发了人们对太赫兹(THz)光谱及其在广泛领域应用的浓厚兴趣。本文首次将机器学习方法应用于太赫兹时域光谱检测的牛血清白蛋白(BSA)沉积薄膜的定量表征。使用支持向量回归方法分析了浓度范围为0.5至35 mg/ml的溶液制备的BSA薄膜的光谱数据,以学习频率与目标浓度之间的潜在模型。所学习的模型成功地预测了未知测试样品的浓度,决定系数R = 0.97932。此外,为了确定每个频率与浓度的相关性,使用了最大信息系数统计分析,并分别在1.2、1.1和0.5 THz处确定了太赫兹频率中三个最具区分性的频率,这意味着通过使用前三个相关频率可以实现对BSA浓度的良好预测。此外,最具区分性的频率与BSA蛋白的长波长弹性振动模型预测的频率高度一致。