Wang Eryu, Luo Liping, Wang Jiachuan, Dai Jiaxin, Li Shuangyin, Chen Lei, Li Jia
Innovation, Policy and Entrepreneurship Thrust, Society Hub, The Hong Kong University of Science and Technology (Guangzhou), No.1 Duxue Road, Nansha, Guangzhou, 511453, China.
Data Science and Analytics Thrust, Information Hub, The Hong Kong University of Science and Technology (Guangzhou), No.1 Duxue Road, Nansha, Guangzhou, 511453, China.
Sci Data. 2025 May 1;12(1):724. doi: 10.1038/s41597-025-05037-1.
Amine-impregnated solid adsorbents are widely explored for point source capture and direct air capture (DAC) to address climate change. Existing literature serves as a valuable source for the investigation of amine-functionalized solid adsorbents. This study selected 52 articles from bibliographic platforms using GPT-assisted data source screening. A total of 1,336 data points were manually collected. Each data point is characterized by 28 features including the CO capture performance of various adsorbents from diluted to concentrated sources, resulting in 29,857 records. The methodology addresses inconsistencies in units and terminologies in the published articles and demonstrates database reliability, regularity and integrity through statistical analysis. The diverse types of amines and mesoporous solids in the database offer innovation potential for future research. In addition, two machine learning models were trained to promote dataset reuse by scientists from lab-based research and cheminformatics. This study provides opportunities to explore the use of machine learning on small databases and encourages data sharing and uniform reporting among DAC communities.
胺浸渍固体吸附剂被广泛用于点源捕获和直接空气捕获(DAC)以应对气候变化。现有文献是研究胺功能化固体吸附剂的宝贵资源。本研究使用GPT辅助数据源筛选从文献平台中选取了52篇文章。共手动收集了1336个数据点。每个数据点由28个特征表征,包括各种吸附剂从稀释源到浓缩源的CO捕获性能,从而产生了29857条记录。该方法解决了已发表文章中单位和术语的不一致问题,并通过统计分析证明了数据库的可靠性、规律性和完整性。数据库中多样的胺类和介孔固体类型为未来研究提供了创新潜力。此外,还训练了两个机器学习模型,以促进基于实验室的研究和化学信息学领域的科学家对数据集的重用。本研究为探索在小型数据库上使用机器学习提供了机会,并鼓励DAC社区之间的数据共享和统一报告。