Liu Jie, Pan Jing-chang, Luo A-li, Wei Peng, Liu Meng
Guang Pu Xue Yu Guang Pu Fen Xi. 2015 Dec;35(12):3524-8.
Distance metric is an important issue for the spectroscopic survey data processing, which defines a calculation method of the distance between two different spectra. Based on this, the classification, clustering, parameter measurement and outlier data mining of spectral data can be carried out. Therefore, the distance measurement method has some effect on the performance of the classification, clustering, parameter measurement and outlier data mining. With the development of large-scale stellar spectral sky surveys, how to define more efficient distance metric on stellar spectra has become a very important issue in the spectral data processing. Based on this problem and fully considering of the characteristics and data features of the stellar spectra, a new distance measurement method of stellar spectra named Residual Distribution Distance is proposed. While using this method to measure the distance, the two spectra are firstly scaled and then the standard deviation of the residual is used the distance. Different from the traditional distance metric calculation methods of stellar spectra, when used to calculate the distance between stellar spectra, this method normalize the two spectra to the same scale, and then calculate the residual corresponding to the same wavelength, and the standard error of the residual spectrum is used as the distance measure. The distance measurement method can be used for stellar classification, clustering and stellar atmospheric physical parameters measurement and so on. This paper takes stellar subcategory classification as an example to test the distance measure method. The results show that the distance defined by the proposed method is more effective to describe the gap between different types of spectra in the classification than other methods, which can be well applied in other related applications. At the same time, this paper also studies the effect of the signal to noise ratio (SNR) on the performance of the proposed method. The result show that the distance is affected by the SNR. The smaller the signal-to-noise ratio is, the greater impact is on the distance; While SNR is larger than 10, the signal-to-noise ratio has little effect on the performance for the classification.
距离度量是光谱巡天数据处理中的一个重要问题,它定义了两种不同光谱之间距离的计算方法。基于此,可以对光谱数据进行分类、聚类、参数测量和异常数据挖掘。因此,距离测量方法对分类、聚类、参数测量和异常数据挖掘的性能有一定影响。随着大规模恒星光谱巡天的发展,如何在恒星光谱上定义更有效的距离度量已成为光谱数据处理中一个非常重要的问题。基于这个问题并充分考虑恒星光谱的特征和数据特性,提出了一种名为残差分布距离的恒星光谱距离测量新方法。使用该方法测量距离时,首先对两种光谱进行缩放,然后将残差的标准差用作距离。与传统的恒星光谱距离度量计算方法不同,该方法在计算恒星光谱之间的距离时,将两种光谱归一化到相同尺度,然后计算相同波长对应的残差,并将残差光谱的标准误差用作距离度量。该距离测量方法可用于恒星分类、聚类和恒星大气物理参数测量等。本文以恒星子类分类为例来测试该距离测量方法。结果表明,所提方法定义的距离在分类中比其他方法更有效地描述了不同类型光谱之间的差距,可很好地应用于其他相关应用中。同时,本文还研究了信噪比(SNR)对所提方法性能的影响。结果表明,距离受信噪比影响。信噪比越小,对距离的影响越大;而当信噪比大于10时,信噪比对分类性能的影响很小。