• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SeqScreen:通过集成学习进行准确且敏感的致病性序列功能筛选。

SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning.

机构信息

Department of Computer Science, Rice University, Houston, TX, USA.

Signature Science, LLC, 8329 North Mopac Expressway, Austin, TX, USA.

出版信息

Genome Biol. 2022 Jun 20;23(1):133. doi: 10.1186/s13059-022-02695-x.

DOI:10.1186/s13059-022-02695-x
PMID:35725628
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9208262/
Abstract

The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen .

摘要

新冠疫情强调了准确检测已知和新兴病原体的重要性。然而,对病原体序列的全面描述仍然是一个悬而未决的挑战。为了满足这一需求,我们开发了 SeqScreen,它使用分类学和功能标签以及一组针对微生物发病机制的定制化的精选序列关注点功能(FunSoCs)来准确描述短核苷酸序列。我们展示了我们的集成机器学习模型可以使用 FunSoCs 对蛋白质编码序列进行标记,具有较高的召回率和精度。SeqScreen 是朝着功能信息综合 DNA 筛选和病原体特征描述的新范例迈出的一步,可在 www.gitlab.com/treangenlab/seqscreen 上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/30883f5a2c8b/13059_2022_2695_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/7e394e26f5f4/13059_2022_2695_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/bad7bf6ac6fb/13059_2022_2695_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/01e5ddd1e0a4/13059_2022_2695_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/6182aa12b18a/13059_2022_2695_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/926f3c1d5617/13059_2022_2695_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/30883f5a2c8b/13059_2022_2695_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/7e394e26f5f4/13059_2022_2695_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/bad7bf6ac6fb/13059_2022_2695_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/01e5ddd1e0a4/13059_2022_2695_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/6182aa12b18a/13059_2022_2695_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/926f3c1d5617/13059_2022_2695_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17be/9210734/30883f5a2c8b/13059_2022_2695_Fig6_HTML.jpg

相似文献

1
SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning.SeqScreen:通过集成学习进行准确且敏感的致病性序列功能筛选。
Genome Biol. 2022 Jun 20;23(1):133. doi: 10.1186/s13059-022-02695-x.
2
Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection.利用 ResNets 和经过整理的真菌-宿主数据集检测新型真菌病原体的 DNA。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii168-ii174. doi: 10.1093/bioinformatics/btac495.
3
csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames.csORF-finder:一种用于准确识别多物种编码短开放阅读框的有效集成学习框架。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac392.
4
Categorizing Sequences of Concern by Function To Better Assess Mechanisms of Microbial Pathogenesis.按功能对关注序列进行分类,以更好地评估微生物发病机制。
Infect Immun. 2022 May 19;90(5):e0033421. doi: 10.1128/IAI.00334-21. Epub 2021 Nov 15.
5
Long non-coding RNAs in biomarking COVID-19: a machine learning-based approach.基于机器学习的长非编码 RNA 在 COVID-19 生物标志物中的研究。
Virol J. 2024 Jun 7;21(1):134. doi: 10.1186/s12985-024-02408-9.
6
A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system.一种新的混合集成机器学习模型,用于严重程度风险评估和 COVID 后预测系统。
Math Biosci Eng. 2022 Apr 13;19(6):6102-6123. doi: 10.3934/mbe.2022285.
7
PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment.PlasmidHawk 通过序列比对提高了工程质粒的来源预测的实验室准确性。
Nat Commun. 2021 Feb 26;12(1):1167. doi: 10.1038/s41467-021-21180-w.
8
Prediction of plant lncRNA by ensemble machine learning classifiers.基于集成机器学习分类器的植物 lncRNA 预测。
BMC Genomics. 2018 May 2;19(1):316. doi: 10.1186/s12864-018-4665-2.
9
Rescuing low frequency variants within intra-host viral populations directly from Oxford Nanopore sequencing data.从牛津纳米孔测序数据中直接拯救宿主内病毒群体中的低频变异体。
Nat Commun. 2022 Mar 14;13(1):1321. doi: 10.1038/s41467-022-28852-1.
10
Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study.利用自动化机器学习预测 COVID-19 患者的死亡率:预测模型开发研究。
J Med Internet Res. 2021 Feb 26;23(2):e23458. doi: 10.2196/23458.

引用本文的文献

1
Inter-tool analysis of a NIST dataset for assessing baseline nucleic acid sequence screening.用于评估基线核酸序列筛查的美国国家标准与技术研究院数据集的工具间分析。
bioRxiv. 2025 Jun 1:2025.05.30.655379. doi: 10.1101/2025.05.30.655379.
2
A Comparison of Methods for the Optimal Recovery of the Human Fecal Virome.人类粪便病毒组最佳回收方法的比较
medRxiv. 2025 May 13:2025.05.12.25327428. doi: 10.1101/2025.05.12.25327428.
3
De novo virulence feature discovery and risk assessment in Klebsiella pneumoniae based on microbial genome vectorization.

本文引用的文献

1
Performance Characteristics of Next-Generation Sequencing for the Detection of Antimicrobial Resistance Determinants in Escherichia coli Genomes and Metagenomes.下一代测序技术在检测大肠杆菌基因组和宏基因组中抗菌药物耐药决定因子的性能特征。
mSystems. 2022 Jun 28;7(3):e0002222. doi: 10.1128/msystems.00022-22. Epub 2022 Jun 1.
2
Critical Assessment of Metagenome Interpretation: the second round of challenges.宏基因组解读的关键评估:第二轮挑战。
Nat Methods. 2022 Apr;19(4):429-440. doi: 10.1038/s41592-022-01431-4. Epub 2022 Apr 8.
3
Categorizing Sequences of Concern by Function To Better Assess Mechanisms of Microbial Pathogenesis.
基于微生物基因组向量化的肺炎克雷伯菌新毒力特征发现与风险评估
Commun Biol. 2025 Apr 17;8(1):623. doi: 10.1038/s42003-025-07678-9.
4
New and revised gene ontology biological process terms describe multiorganism interactions critical for understanding microbial pathogenesis and sequences of concern.新的和修订的基因本体生物学过程术语描述了对于理解微生物发病机制和相关序列至关重要的多生物体相互作用。
J Biomed Semantics. 2025 Mar 21;16(1):4. doi: 10.1186/s13326-025-00323-8.
5
Annotation of Functions of Sequences of Concern and Its Relevance to the New Biosecurity Regulatory Framework in the United States.关注序列功能注释及其与美国新生物安全监管框架的相关性
Appl Biosaf. 2024 Sep 18;29(3):142-149. doi: 10.1089/apb.2023.0030. eCollection 2024 Sep.
6
Practical Questions for Securing Nucleic Acid Synthesis.核酸合成保障的实际问题
Appl Biosaf. 2024 Sep 18;29(3):159-171. doi: 10.1089/apb.2023.0028. eCollection 2024 Sep.
7
Enhancing Gene Synthesis Security: An Updated Framework for Synthetic Nucleic Acid Screening and the Responsible Use of Synthetic Biological Materials.加强基因合成安全性:合成核酸筛查及合成生物材料负责任使用的更新框架
Appl Biosaf. 2024 Jun 20;29(2):63-70. doi: 10.1089/apb.2023.0036. eCollection 2024 Jun.
8
Microbial biofilms on macroalgae harbour diverse integron gene cassettes.大型海藻上的微生物生物膜携带有多样的整合子基因盒。
Microbiology (Reading). 2024 Mar;170(3). doi: 10.1099/mic.0.001446.
9
Beyond Biosecurity by Taxonomic Lists: Lessons, Challenges, and Opportunities.超越按分类清单进行的生物安全:经验教训、挑战与机遇
Health Secur. 2023 Nov-Dec;21(6):521-529. doi: 10.1089/hs.2022.0109. Epub 2023 Oct 19.
10
Safety by design: Biosafety and biosecurity in the age of synthetic genomics.设计中的安全:合成基因组学时代的生物安全与生物安保。
iScience. 2023 Feb 10;26(3):106165. doi: 10.1016/j.isci.2023.106165. eCollection 2023 Mar 17.
按功能对关注序列进行分类,以更好地评估微生物发病机制。
Infect Immun. 2022 May 19;90(5):e0033421. doi: 10.1128/IAI.00334-21. Epub 2021 Nov 15.
4
Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3.利用 bioBakery 3 整合具有分类学、功能和菌株水平特征的多样化微生物群落。
Elife. 2021 May 4;10:e65088. doi: 10.7554/eLife.65088.
5
Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences.可解释人工智能揭示了与表型差异相关的皮肤微生物组组成变化。
Sci Rep. 2021 Feb 25;11(1):4565. doi: 10.1038/s41598-021-83922-6.
6
PathoFact: a pipeline for the prediction of virulence factors and antimicrobial resistance genes in metagenomic data.PathoFact:一种用于预测宏基因组数据中毒力因子和抗菌药物耐药基因的管道。
Microbiome. 2021 Feb 17;9(1):49. doi: 10.1186/s40168-020-00993-9.
7
Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2.使用 Kraken 2 进行快速准确的 16S rRNA 微生物群落分析。
Microbiome. 2020 Aug 28;8(1):124. doi: 10.1186/s40168-020-00900-2.
8
Synthetic DNA and biosecurity: Nuances of predicting pathogenicity and the impetus for novel computational approaches for screening oligonucleotides.合成DNA与生物安全:预测致病性的细微差别及筛选寡核苷酸新计算方法的推动力
PLoS Pathog. 2020 Aug 6;16(8):e1008649. doi: 10.1371/journal.ppat.1008649. eCollection 2020 Aug.
9
Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients.新型冠状病毒肺炎患者支气管肺泡灌洗液和外周血单个核细胞的转录组学特征。
Emerg Microbes Infect. 2020 Dec;9(1):761-770. doi: 10.1080/22221751.2020.1747363.
10
Interpretable and accurate prediction models for metagenomics data.可解释且准确的宏基因组学数据预测模型。
Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa010.