Suppr超能文献

通过融合物理化学性质和核苷酸分布模式的序列衍生特征来鉴定长链染色体外环状DNA

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns.

作者信息

Abbasi Ahtisham Fazeel, Asim Muhammad Nabeel, Ahmed Sheraz, Dengel Andreas

机构信息

Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, 67663, Kaiserslautern, Germany.

German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany.

出版信息

Sci Rep. 2024 Apr 24;14(1):9466. doi: 10.1038/s41598-024-57457-5.

Abstract

Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.

摘要

长链染色体外环状DNA(leccDNA)调控着多种生物学过程,如基因组不稳定、基因扩增和肿瘤发生。leccDNA的鉴定对于研究其与癌症、自身免疫性疾病、心血管疾病和神经疾病的潜在关联具有重要意义。此外,了解这些关联可以为疾病机制和潜在治疗方法提供有价值的见解。传统上,基于湿实验室的方法用于鉴定leccDNA,但这些方法受到先验知识需求和资源密集型过程的阻碍,可能限制了它们的更广泛应用。为了推动跨多个物种的leccDNA鉴定过程,本文提出了首个计算预测器。所提出的iLEC-DNA预测器利用支持向量机分类器以及基于序列的核苷酸分布模式和基于物理化学性质的特征。此外,该研究引入了一组与三个物种相关的12个基准leccDNA数据集,即智人(HM)、拟南芥(AT)和酿酒酵母(SC/YS)。使用所提出的预测器、140多个基线预测器和858个编码器集成,在不同实验设置下对12个基准数据集进行了大规模实验。在所提出的预测器在不同的leccDNA数据集上优于基线预测器和编码器集成,在所有数据集上的ACC、MCC和AUC-ROC平均性能值分别为81.09%、62.2%和81.08%。所提出的预测器和基线预测器的源代码可在https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction获取。为了方便科学界,一个用于leccDNA鉴定的网络应用程序可在https://sds_genetic_analysis.opendfki.de/iLEC_DNA/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2951/11043385/33d321b67145/41598_2024_57457_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验