Peng Shuang, Rajjou Loïc
Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000 Versailles, France.
Data Brief. 2024 Aug 22;56:110822. doi: 10.1016/j.dib.2024.110822. eCollection 2024 Oct.
Leguminous crops are vital to sustainable agriculture due to their ability to fix atmospheric nitrogen, improving soil fertility and reducing the need for synthetic fertilizers. Additionally, they are an excellent source of protein for both human consumption and animal feed. Antimicrobial peptides (AMPs), found in various leguminous seeds, exhibit broad-spectrum antimicrobial activity through diverse mechanisms, including interaction with microbial cell membranes and interference with cellular processes, making them valuable for enhancing crop resilience and food safety. In the field of plant sciences, computational biology methods have been instrumental in the discovery and optimization of AMPs. These methods enable rapid exploration of sequence space and the prediction of AMPs using deep learning technologies. Optimizing AMP annotations through computational design offers a strategic approach to enhance efficacy and minimize potential side effects, providing a viable alternative to conventional antimicrobial agents. However, the presence of overlapping sequences across multiple databases poses a challenge for creating a reliable dataset for AMP prediction. To address this, we conducted a comprehensive analysis of sequence redundancy across various AMP databases. These databases encompass a wide range of AMPs from different sources and with specific functions, including both naturally occurring and artificially synthesized AMPs. Our analysis revealed significant overlap, underscoring the need for a non-redundant AMP sequence database. We present the development of a new database that consolidates unique AMP sequences derived from leguminous seeds, aiming to create a more refined dataset for the binary classification and prediction of plant-derived AMPs. This database will support the advancement of sustainable agricultural practices by enhancing the use of plant-based AMPs in agroecology, contributing to improved crop protection and food security.
豆科作物对可持续农业至关重要,因为它们能够固定大气中的氮,提高土壤肥力并减少对合成肥料的需求。此外,它们是人类食用和动物饲料的优质蛋白质来源。在各种豆科种子中发现的抗菌肽(AMPs)通过多种机制表现出广谱抗菌活性,包括与微生物细胞膜相互作用和干扰细胞过程,这使得它们对于增强作物抗性和食品安全具有重要价值。在植物科学领域,计算生物学方法在抗菌肽的发现和优化中发挥了重要作用。这些方法能够利用深度学习技术快速探索序列空间并预测抗菌肽。通过计算设计优化抗菌肽注释提供了一种提高疗效并最小化潜在副作用的策略方法,为传统抗菌剂提供了可行的替代方案。然而,多个数据库中重叠序列的存在给创建用于抗菌肽预测的可靠数据集带来了挑战。为了解决这个问题,我们对各种抗菌肽数据库中的序列冗余进行了全面分析。这些数据库包含来自不同来源且具有特定功能的广泛抗菌肽,包括天然存在的和人工合成的抗菌肽。我们的分析揭示了显著的重叠,强调了需要一个非冗余的抗菌肽序列数据库。我们展示了一个新数据库的开发,该数据库整合了源自豆科种子的独特抗菌肽序列,旨在为植物源抗菌肽的二元分类和预测创建一个更精细的数据集。该数据库将通过加强植物源抗菌肽在农业生态学中的应用来支持可持续农业实践的发展,有助于改善作物保护和粮食安全。