Suppr超能文献

使用遗传算法进行特征选择有助于提高微生物群信息在同卵双胞胎识别中的区分能力。

Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification.

作者信息

Fu Guangping, Ma Guanju, Dou Shujie, Wang Qian, Fu Lihong, Zhang Xiaojing, Lu Chaolong, Cong Bin, Li Shujin

机构信息

College of Forensic Medicine, Hebei Medical University, Hebei Key Laboratory of Forensic Medicine, Research Unit of Digestive Tract Microecosystem Pharmacology and Toxicology, Chinese Academy of Medical Sciences, Shijiazhuang, China.

Hainan Tropical Forensic Medicine Academician Workstation, Haikou, China.

出版信息

Front Microbiol. 2023 Jul 24;14:1210638. doi: 10.3389/fmicb.2023.1210638. eCollection 2023.

Abstract

INTRODUCTION

Personal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of detected microbial communities, and low-value species would limit the performance of previous models.

METHODS

To address this issue, we collected 80 saliva samples from 10 pairs of MZTs at four different time points and used 16s rRNA V3-V4 region sequencing to obtain microbiota information. The data formed 280 inner-individual (Self) or MZT sample pairs, divided into four groups based on the individual relationship and time interval, and then randomly divided into training and testing sets with an 8:2 ratio. We built 12 identification models based on the time interval ( ≤ 1 year or ≥ 2 months), data basis (Amplicon sequence variants, ASVs or Operational taxonomic unit, OTUs), and distance parameter selection (Jaccard distance, Bray-Curist distance, or Hellinger distance) and then improved their identification power through genetic algorithm processes. The best combination of databases with distance parameters was selected as the final model for the two types of time intervals. Bayes theory was introduced to provide a numerical indicator of the evidence's effectiveness in practical cases.

RESULTS

From the 80 saliva samples, 369 OTUs and 1130 ASVs were detected. After the feature selection process, ASV-Jaccard distance models were selected as the final models for the two types of time intervals. For short interval samples, the final model can completely distinguish MZT pairs from Self ones in both training and test sets.

DISCUSSION

Our findings support the microbiota solution to the challenging MZT identification problem and highlight the importance of feature selection in improving model performance.

摘要

引言

在法医遗传学中,同卵双胞胎(MZT)的个人识别一直具有挑战性。先前的研究表明,微生物标记因其特异性和长期稳定性而具有潜在价值。然而,这些研究使用的是检测到的微生物群落的完整信息,低价值物种会限制先前模型的性能。

方法

为了解决这个问题,我们在四个不同时间点从10对同卵双胞胎中收集了80份唾液样本,并使用16s rRNA V3-V4区域测序来获取微生物群信息。这些数据形成了280个个体内(自我)或同卵双胞胎样本对,根据个体关系和时间间隔分为四组,然后以8:2的比例随机分为训练集和测试集。我们基于时间间隔(≤1年或≥2个月)、数据基础(扩增子序列变体,ASV或操作分类单元,OTU)和距离参数选择(杰卡德距离、布雷-柯蒂斯距离或海林格距离)构建了12个识别模型,然后通过遗传算法过程提高其识别能力。选择数据库与距离参数的最佳组合作为两种时间间隔的最终模型。引入贝叶斯理论以提供证据在实际案例中有效性的数值指标。

结果

从80份唾液样本中检测到369个OTU和1130个ASV。经过特征选择过程后,ASV-杰卡德距离模型被选为两种时间间隔的最终模型。对于短时间间隔样本,最终模型在训练集和测试集中都能完全区分同卵双胞胎对和自我样本对。

讨论

我们的研究结果支持微生物群解决方案来解决具有挑战性的同卵双胞胎识别问题,并强调了特征选择在提高模型性能方面的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ddb/10406218/c3dd46f6ddc6/fmicb-14-1210638-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验