推进考古学中的预测建模：对大阶梯-埃斯卡兰特国家纪念地的回归和机器学习方法的评估。

Advancing predictive modeling in archaeology: An evaluation of regression and machine learning methods on the Grand Staircase-Escalante National Monument.

机构信息

Department of Anthropology, University of Utah, Salt Lake City, Utah, United States of America.

Archaeological Center, University of Utah, Salt Lake City, Utah, United States of America.

出版信息

PLoS One. 2020 Oct 1;15(10):e0239424. doi: 10.1371/journal.pone.0239424. eCollection 2020.

DOI:10.1371/journal.pone.0239424

PMID:33002016

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7529236/

Abstract

Predictive models are central to both archaeological research and cultural resource management. Yet, archaeological applications of predictive models are often insufficient due to small training data sets, inadequate statistical techniques, and a lack of theoretical insight to explain the responses of past land use to predictor variables. Here we address these critiques and evaluate the predictive power of four statistical approaches widely used in ecological modeling-generalized linear models, generalized additive models, maximum entropy, and random forests-to predict the locations of Formative Period (2100-650 BP) archaeological sites in the Grand Staircase-Escalante National Monument. We assess each modeling approach using a threshold-independent measure, the area under the curve (AUC), and threshold-dependent measures, like the true skill statistic. We find that the majority of the modeling approaches struggle with archaeological datasets due to the frequent lack of true-absence locations, which violates model assumptions of generalized linear models, generalized additive models, and random forests, as well as measures of their predictive power (AUC). Maximum entropy is the only method tested here which is capable of utilizing pseudo-absence points (inferred absence data based on known presence data) and controlling for a non-representative sampling of the landscape, thus making maximum entropy the best modeling approach for common archaeological data when the goal is prediction. Regression-based approaches may be more applicable when prediction is not the goal, given their grounding in well-established statistical theory. Random forests, while the most powerful, is not applicable to archaeological data except in the rare case where true-absence data exist. Our results have significant implications for the application of predictive models by archaeologists for research and conservation purposes and highlight the importance of understanding model assumptions.

摘要

预测模型是考古学研究和文化资源管理的核心。然而，由于训练数据集较小、统计技术不足以及缺乏理论洞察力来解释过去土地利用对预测变量的反应，考古学中应用预测模型往往不够充分。在这里，我们解决了这些批评，并评估了四种广泛应用于生态建模的统计方法——广义线性模型、广义加性模型、最大熵和随机森林——预测形成期（2100-650 BP）考古遗址在大阶梯-埃斯卡兰特国家纪念碑中的位置的预测能力。我们使用独立于阈值的度量标准——曲线下面积（AUC）和依赖于阈值的度量标准，如真技能统计量，来评估每种建模方法。我们发现，由于经常缺乏真实的缺失位置，大多数建模方法都难以处理考古数据集，这违反了广义线性模型、广义加性模型和随机森林的模型假设，以及它们的预测能力（AUC）的度量标准。最大熵是这里测试的唯一一种能够利用伪缺失点（基于已知存在数据推断的缺失数据）并控制景观代表性不足的方法，因此，当目标是预测时，最大熵是最适合常见考古数据的建模方法。基于回归的方法在预测不是目标的情况下可能更适用，因为它们基于成熟的统计理论。随机森林虽然功能最强大，但除非存在真实的缺失数据，否则不适用于考古数据。我们的结果对考古学家为研究和保护目的应用预测模型具有重要意义，并强调了理解模型假设的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74f9/7529236/652ad070d81a/pone.0239424.g001.jpg

相似文献

Advancing predictive modeling in archaeology: An evaluation of regression and machine learning methods on the Grand Staircase-Escalante National Monument.推进考古学中的预测建模：对大阶梯-埃斯卡兰特国家纪念地的回归和机器学习方法的评估。

PLoS One. 2020 Oct 1;15(10):e0239424. doi: 10.1371/journal.pone.0239424. eCollection 2020.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Semi-supervised machine learning approaches for predicting the chronology of archaeological sites: A case study of temples from medieval Angkor, Cambodia.半监督机器学习方法在考古遗址年代预测中的应用：以柬埔寨吴哥中世纪寺庙为例。

PLoS One. 2018 Nov 5;13(11):e0205649. doi: 10.1371/journal.pone.0205649. eCollection 2018.

Geospatial modeling approach to monument construction using Michigan from A.D. 1000-1600 as a case study.以公元1000 - 1600年的密歇根州为例，对纪念性建筑的地理空间建模方法。

Proc Natl Acad Sci U S A. 2016 Jul 5;113(27):7443-8. doi: 10.1073/pnas.1603450113. Epub 2016 Jun 21.

A hospital wide predictive model for unplanned readmission using hierarchical ICD data.基于 ICD 数据的层级结构的全院范围预测性模型，用于预测非计划性再入院。

Comput Methods Programs Biomed. 2019 May;173:177-183. doi: 10.1016/j.cmpb.2019.02.007. Epub 2019 Feb 13.

Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.美国东部地区遥感气溶胶光学厚度与PM2.5之间关系的评估及统计建模

Res Rep Health Eff Inst. 2012 May(167):5-83; discussion 85-91.

Accessing habitat suitability and connectivity for the westernmost population of Asian black bear (Ursus thibetanus gedrosianus, Blanford, 1877) based on climate changes scenarios in Iran.基于伊朗气候变化情景的亚洲黑熊（Ursus thibetanus gedrosianus，Blanford，1877）最西部种群的生境适宜性和连通性评估。

PLoS One. 2020 Nov 18;15(11):e0242432. doi: 10.1371/journal.pone.0242432. eCollection 2020.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略：以脑出血为例。

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病：模型开发与性能评估

JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.

Comparison of statistical machine learning models for rectal protocol compliance in prostate external beam radiation therapy.统计机器学习模型在前列腺外照射放疗中直肠协议依从性比较。

Med Phys. 2020 Apr;47(4):1452-1459. doi: 10.1002/mp.14044. Epub 2020 Feb 19.

引用本文的文献

Traditional land use is integral to ecological function in SW Madagascar.传统土地利用对马达加斯加西南部的生态功能至关重要。

Sci Rep. 2025 Aug 25;15(1):31160. doi: 10.1038/s41598-025-16992-5.

Ancient levantine demography follows ecological stochasticity.古代黎凡特地区的人口统计学遵循生态随机性。

Sci Rep. 2025 Feb 11;15(1):5044. doi: 10.1038/s41598-025-88863-y.

Island-wide characterization of agricultural production challenges the demographic collapse hypothesis for Rapa Nui (Easter Island).全岛范围的农业生产特征挑战了拉帕努伊（复活节岛）人口崩溃假说。

Sci Adv. 2024 Jun 21;10(25):eado1459. doi: 10.1126/sciadv.ado1459.

The Neanderthal niche space of Western Eurasia 145 ka to 30 ka ago.14.5 万至 3 万年前的旧石器时代晚期欧洲西部尼安德特人的生态位空间。

Sci Rep. 2024 Apr 2;14(1):7788. doi: 10.1038/s41598-024-57490-4.

Climate-driven habitat shifts of high-ranked prey species structure Late Upper Paleolithic hunting.气候驱动的高等级猎物物种的栖息地转移构成了晚更新世狩猎的结构。

Sci Rep. 2023 Mar 14;13(1):4238. doi: 10.1038/s41598-023-31085-x.

The establishment of ecological conservation for herpetofauna species in hotspot areas of South Korea.建立韩国热点地区爬行动物物种的生态保护。

Sci Rep. 2022 Sep 1;12(1):14839. doi: 10.1038/s41598-022-19129-0.

Machine learning for stone artifact identification: Distinguishing worked stone artifacts from natural clasts using deep neural networks.基于机器学习的石器鉴定：使用深度神经网络区分人工石器和天然砾石。

PLoS One. 2022 Aug 10;17(8):e0271582. doi: 10.1371/journal.pone.0271582. eCollection 2022.

Does the Locally-Adaptive Model of Archaeological Potential (LAMAP) work for hunter-gatherer sites? A test using data from the Tanana Valley, Alaska.考古潜力的局部自适应模型（LAMAP）对狩猎采集者遗址是否有效？一项使用阿拉斯加塔纳纳山谷数据的测试。

PLoS One. 2022 Mar 17;17(3):e0265597. doi: 10.1371/journal.pone.0265597. eCollection 2022.

A predictive model for the ichnological suitability of the Jezero crater, Mars: searching for fossilized traces of life-substrate interactions in the 2020 Rover Mission Landing Site.火星杰泽罗陨石坑遗迹适宜性预测模型：在2020年火星车任务着陆点寻找生命与基质相互作用的化石痕迹

PeerJ. 2021 Sep 23;9:e11784. doi: 10.7717/peerj.11784. eCollection 2021.

本文引用的文献

Assessing the Geographic Representativeness of Genebank Collections: the Case of Bolivian Wild Potatoes.评估基因库收集品的地理代表性：以玻利维亚野生马铃薯为例。

Conserv Biol. 2000 Dec 18;14(6):1755-1765. doi: 10.1111/j.1523-1739.2000.98543.x.

Using the Maximal Entropy Modeling Approach to Analyze the Evolution of Sedentary Agricultural Societies in Northeast China.运用最大熵建模方法分析中国东北 sedentary 农业社会的演变。需注意，原文中“sedentary”可能有误，推测可能是“sedentary”应为“sedentary”，更准确的表述或许是“定居的”。完整准确的译文可能是：运用最大熵建模方法分析中国东北定居农业社会的演变。

Entropy (Basel). 2020 Mar 9;22(3):307. doi: 10.3390/e22030307.

Statistics versus machine learning.统计学与机器学习

Nat Methods. 2018 Apr;15(4):233-234. doi: 10.1038/nmeth.4642. Epub 2018 Apr 3.

Spatiotemporal distribution of Holocene populations in North America.全新世北美人口的时空分布。

Proc Natl Acad Sci U S A. 2015 Sep 29;112(39):12127-32. doi: 10.1073/pnas.1505657112. Epub 2015 Sep 8.

Finite-Sample Equivalence in Statistical Models for Presence-Only Data.仅存在数据统计模型中的有限样本等价性。

Ann Appl Stat. 2013 Dec 1;7(4):1917-1939. doi: 10.1214/13-AOAS667.

Grand challenges for archaeology.考古学面临的重大挑战。

Proc Natl Acad Sci U S A. 2014 Jan 21;111(3):879-80. doi: 10.1073/pnas.1324000111.

Environmental productivity predicts migration, demographic, and linguistic patterns in prehistoric California.环境生产力预测了史前加利福尼亚的移民、人口和语言模式。

Proc Natl Acad Sci U S A. 2013 Sep 3;110(36):14569-73. doi: 10.1073/pnas.1302008110. Epub 2013 Aug 19.

Random forests for classification in ecology.用于生态学分类的随机森林

Ecology. 2007 Nov;88(11):2783-92. doi: 10.1890/07-0539.1.

Optimal foraging, the marginal value theorem.最优觅食，边际价值定理。

Theor Popul Biol. 1976 Apr;9(2):129-36. doi: 10.1016/0040-5809(76)90040-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

推进考古学中的预测建模：对大阶梯-埃斯卡兰特国家纪念地的回归和机器学习方法的评估。

Advancing predictive modeling in archaeology: An evaluation of regression and machine learning methods on the Grand Staircase-Escalante National Monument.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献