Loudermilk J Brian, Himmelsbach David S, Barton Franklin E, de Haseth James A
Department of Chemistry, The University of Georgia, Athens, GA 30602-2556, USA.
Appl Spectrosc. 2008 Jun;62(6):661-70. doi: 10.1366/000370208784657968.
During harvest, a variety of plant based contaminants are collected along with cotton lint. The USDA previously created a mid-infrared, attenuated total reflection (ATR), Fourier transform infrared (FT-IR) spectral library of cotton contaminants for contaminant identification as the contaminants have negative impacts on yarn quality. This library has shown impressive identification rates for extremely similar cellulose based contaminants in cases where the library was representative of the samples searched. When spectra of contaminant samples from crops grown in different geographic locations, seasons, and conditions and measured with a different spectrometer and accessories were searched, identification rates for standard search algorithms decreased significantly. Six standard algorithms were examined: dot product, correlation, sum of absolute values of differences, sum of the square root of the absolute values of differences, sum of absolute values of differences of derivatives, and sum of squared differences of derivatives. Four categories of contaminants derived from cotton plants were considered: leaf, stem, seed coat, and hull. Experiments revealed that the performance of the standard search algorithms depended upon the category of sample being searched and that different algorithms provided complementary information about sample identity. These results indicated that choosing a single standard algorithm to search the library was not possible. Three voting scheme algorithms based on result frequency, result rank, category frequency, or a combination of these factors for the results returned by the standard algorithms were developed and tested for their capability to overcome the unpredictability of the standard algorithms' performances. The group voting scheme search was based on the number of spectra from each category of samples represented in the library returned in the top ten results of the standard algorithms. This group algorithm was able to identify correctly as many test spectra as the best standard algorithm without relying on human choice to select a standard algorithm to perform the searches.
在收获期间,各种植物性污染物会与棉绒一起被收集起来。美国农业部此前创建了一个用于棉类污染物识别的中红外衰减全反射(ATR)傅里叶变换红外(FT-IR)光谱库,因为这些污染物会对纱线质量产生负面影响。在该库能够代表所搜索样本的情况下,对于极为相似的纤维素类污染物,该库显示出了令人印象深刻的识别率。当搜索来自不同地理位置、季节和条件下种植的作物的污染物样本光谱,且这些样本是用不同的光谱仪和附件测量时,标准搜索算法的识别率会显著下降。研究了六种标准算法:点积、相关性、差值绝对值之和、差值绝对值平方根之和、导数差值绝对值之和以及导数平方差之和。考虑了源自棉花植株的四类污染物:叶子、茎、种皮和外壳。实验表明,标准搜索算法的性能取决于所搜索样本的类别,并且不同算法提供了关于样本身份的互补信息。这些结果表明,不可能选择单一的标准算法来搜索该库。开发并测试了三种基于结果频率、结果排名、类别频率或这些因素组合的投票方案算法,用于标准算法返回结果的投票,以检验其克服标准算法性能不可预测性的能力。群体投票方案搜索基于库中返回的标准算法前十结果中所代表的各类样本光谱数量。这种群体算法能够正确识别与最佳标准算法一样多的测试光谱,而无需依靠人工选择标准算法来进行搜索。