Earth and Planetary Sciences, University of California, Riverside, California, USA.
Department of Earth Sciences, University of Toronto, Toronto, Canada.
Astrobiology. 2024 Nov;24(11):1110-1127. doi: 10.1089/ast.2024.0019. Epub 2024 Oct 25.
We propose a novel approach to identify the origin of pyrite grains and distinguish biologically influenced sedimentary pyrite using combined sulfur isotope (δS) and trace element (TE) analyses. To classify and predict the origin of individual pyrite grains, we applied multiple machine-learning algorithms to coupled δS and TE data from pyrite grains formed from diverse sedimentary, hydrothermal, and metasomatic processes across geologic time. Our unsupervised classification algorithm, K-means++ cluster analysis, yielded six classes based on the formation environment of the pyrite: sedimentary, low temperature hydrothermal, medium temperature, polymetallic hydrothermal, high temperature, and large euhedral. We tested three supervised models (random forest [RF], Naïve Bayes, k-nearest neighbors), and RF outperformed the others in predicting pyrite formation type, achieving a precision (area under the ROC curve) of 0.979 ± 0.005 and an overall average class accuracy of 0.878 ± 0.005. Moreover, we found that coupling TE and δS data significantly improved the performance of the RF model compared with using either TE or δS data alone. Our data provide a novel framework for exploring sedimentary rocks that have undergone multiple hydrothermal, magmatic, and metamorphic alterations. Most significant, however, is the demonstrated potential for distinguishing between biogenic and abiotic pyrite in samples from early Earth. This approach could also be applied to the search for potential biosignatures in samples returned from Mars.
我们提出了一种新的方法来识别黄铁矿颗粒的起源,并使用硫同位素(δS)和微量元素(TE)分析来区分受生物影响的沉积黄铁矿。为了对单个黄铁矿颗粒的起源进行分类和预测,我们应用了多种机器学习算法,对来自不同沉积、热液和交代过程的黄铁矿颗粒的δS 和 TE 数据进行了分析,这些过程跨越了地质时间。我们的无监督分类算法 K-means++聚类分析,根据黄铁矿的形成环境,将其分为六类:沉积、低温热液、中温、多金属热液、高温和大自形。我们测试了三种有监督模型(随机森林 [RF]、朴素贝叶斯、k-最近邻),结果表明 RF 在预测黄铁矿形成类型方面表现优于其他模型,其精度(ROC 曲线下的面积)为 0.979 ± 0.005,总体平均分类准确率为 0.878 ± 0.005。此外,我们发现与单独使用 TE 或 δS 数据相比,耦合 TE 和 δS 数据显著提高了 RF 模型的性能。我们的数据为探索经历了多次热液、岩浆和变质作用的沉积岩提供了一个新的框架。然而,最重要的是,我们证明了在来自早期地球的样本中区分生物成因和非生物成因黄铁矿的潜力。这种方法也可以应用于在从火星返回的样本中寻找潜在的生物特征。