Suppr超能文献

基于机器学习的高稀疏测光数据恒星分类

Machine learning based stellar classification with highly sparse photometry data.

作者信息

Cody Seán Enis, Scher Sebastian, McDonald Iain, Zijlstra Albert, Alexander Emma, Cox Nick

机构信息

Know-Center GmbH, Graz, 8010, Austria.

The University of Manchester, Manchester, England, M13 9PL, UK.

出版信息

Open Res Eur. 2024 Aug 28;4:29. doi: 10.12688/openreseurope.17023.2. eCollection 2024.

Abstract

BACKGROUND

Identifying stars belonging to different classes is vital in order to build up statistical samples of different phases and pathways of stellar evolution. In the era of surveys covering billions of stars, an automated method of identifying these classes becomes necessary.

METHODS

Many classes of stars are identified based on their emitted spectra. In this paper, we use a combination of the multi-class multi-label Machine Learning (ML) method XGBoost and the PySSED spectral-energy-distribution fitting algorithm to classify stars into nine different classes, based on their photometric data. The classifier is trained on subsets of the SIMBAD database. Particular challenges are the very high sparsity (large fraction of missing values) of the underlying data as well as the high class imbalance. We discuss the different variables available, such as photometric measurements on the one hand, and indirect predictors such as Galactic position on the other hand.

RESULTS

We show the difference in performance when excluding certain variables, and discuss in which contexts which of the variables should be used. Finally, we show that increasing the number of samples of a particular type of star significantly increases the performance of the model for that particular type, while having little to no impact on other types. The accuracy of the main classifier is ∼0.7 with a macro F1 score of 0.61.

CONCLUSIONS

While the current accuracy of the classifier is not high enough to be reliably used in stellar classification, this work is an initial proof of feasibility for using ML to classify stars based on photometry.

摘要

背景

识别属于不同类别的恒星对于构建恒星演化不同阶段和路径的统计样本至关重要。在涵盖数十亿颗恒星的巡天时代,一种自动识别这些类别的方法变得必不可少。

方法

许多类别的恒星是根据其发射光谱来识别的。在本文中,我们使用多类多标签机器学习(ML)方法XGBoost和PySSED光谱能量分布拟合算法的组合,基于恒星的光度数据将其分类为九个不同的类别。分类器在SIMBAD数据库的子集上进行训练。特别具有挑战性的是基础数据的极高稀疏性(大量缺失值)以及高度的类别不平衡。我们讨论了可用的不同变量,一方面是光度测量,另一方面是诸如银河位置等间接预测变量。

结果

我们展示了排除某些变量时性能的差异,并讨论了在哪些情况下应使用哪些变量。最后,我们表明增加特定类型恒星的样本数量会显著提高该特定类型模型的性能,而对其他类型的影响很小或没有影响。主分类器的准确率约为0.7,宏F1分数为0.61。

结论

虽然当前分类器的准确率还不够高,无法可靠地用于恒星分类,但这项工作是使用机器学习基于光度法对恒星进行分类的可行性初步证明。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8485/11362827/24969c1fa5cc/openreseurope-4-19914-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验