• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SAMSVM:一种利用支持向量机对SAM格式序列进行错配过滤的工具。

SAMSVM: A tool for misalignment filtration of SAM-format sequences with support vector machine.

作者信息

Yang Jianfeng, Ding Xiaofan, Sun Xing, Tsang Shui-Ying, Xue Hong

机构信息

1 Division of Life Science, Applied Genomics Centre and Centre for Statistical Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, P. R. China.

出版信息

J Bioinform Comput Biol. 2015 Dec;13(6):1550025. doi: 10.1142/S0219720015500250. Epub 2015 Aug 24.

DOI:10.1142/S0219720015500250
PMID:26419425
Abstract

Sequence alignment/map (SAM) formatted sequences [Li H, Handsaker B, Wysoker A et al., Bioinformatics 25(16):2078-2079, 2009.] have taken on a main role in bioinformatics since the development of massive parallel sequencing. However, because misalignment of sequences poses a significant problem in analysis of sequencing data that could lead to false positives in variant calling, the exclusion of misaligned reads is a necessity in analysis. In this regard, the multiple features of SAM-formatted sequences can be treated as vectors in a multi-dimension space to allow the application of a support vector machine (SVM). Applying the LIBSVM tools developed by Chang and Lin [Chang C-C, Lin C-J, ACM Trans Intell Syst Technol 2:1-27, 2011.] as a simple interface for support vector classification, the SAMSVM package has been developed in this study to enable misalignment filtration of SAM-formatted sequences. Cross-validation between two simulated datasets processed with SAMSVM yielded accuracies that ranged from 0.89 to 0.97 with F-scores ranging from 0.77 to 0.94 in 14 groups characterized by different mutation rates from 0.001 to 0.1, indicating that the model built using SAMSVM was accurate in misalignment detection. Application of SAMSVM to actual sequencing data resulted in filtration of misaligned reads and correction of variant calling.

摘要

自大规模平行测序技术发展以来,序列比对/映射(SAM)格式的序列[Li H, Handsaker B, Wysoker A等,《生物信息学》25(16):2078 - 2079, 2009年]在生物信息学中发挥了主要作用。然而,由于序列比对错误在测序数据分析中是一个重大问题,可能导致变异检测出现假阳性,因此在分析中排除比对错误的 reads 是必要的。在这方面,SAM 格式序列的多个特征可被视为多维空间中的向量,从而允许应用支持向量机(SVM)。本研究开发了 SAMSVM 软件包,将 Chang 和 Lin [Chang C-C, Lin C-J, 《ACM 智能系统与技术汇刊》2:1 - 27, 2011 年]开发的 LIBSVM 工具用作支持向量分类的简单接口,以实现对 SAM 格式序列的比对错误过滤。在使用 SAMSVM 处理的两个模拟数据集之间进行交叉验证,在 14 个以 0.001 至 0.1 的不同突变率为特征的组中,准确率范围为 0.89 至 0.97,F 值范围为 0.77 至 0.94,这表明使用 SAMSVM 构建的模型在比对错误检测方面是准确的。将 SAMSVM 应用于实际测序数据可实现比对错误 reads 的过滤和变异检测的校正。

相似文献

1
SAMSVM: A tool for misalignment filtration of SAM-format sequences with support vector machine.SAMSVM:一种利用支持向量机对SAM格式序列进行错配过滤的工具。
J Bioinform Comput Biol. 2015 Dec;13(6):1550025. doi: 10.1142/S0219720015500250. Epub 2015 Aug 24.
2
A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data.基于下一代测序数据的单核苷酸多态性识别的支持向量机。
Bioinformatics. 2013 Jun 1;29(11):1361-6. doi: 10.1093/bioinformatics/btt172. Epub 2013 Apr 24.
3
Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.重新校准Illumina短读长比对中的映射质量分数可改善低覆盖度测序数据中的单核苷酸多态性(SNP)检测结果。
PeerJ. 2020 Dec 7;8:e10501. doi: 10.7717/peerj.10501. eCollection 2020.
4
RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences.RVMAB:使用相关向量机模型结合平均块从蛋白质序列预测蛋白质相互作用
Int J Mol Sci. 2016 May 18;17(5):757. doi: 10.3390/ijms17050757.
5
Classification of imbalanced bioinformatics data by using boundary movement-based ELM.基于边界移动的极限学习机对不平衡生物信息学数据的分类
Biomed Mater Eng. 2015;26 Suppl 1:S1855-62. doi: 10.3233/BME-151488.
6
Vecuum: identification and filtration of false somatic variants caused by recombinant vector contamination.真空:由重组载体污染引起的假体变异的识别和过滤。
Bioinformatics. 2016 Oct 15;32(20):3072-3080. doi: 10.1093/bioinformatics/btw383. Epub 2016 Jun 22.
7
STR-realigner: a realignment method for short tandem repeat regions.STR重排器:一种用于短串联重复区域的重排方法。
BMC Genomics. 2016 Dec 3;17(1):991. doi: 10.1186/s12864-016-3294-x.
8
Variant Calling From Next Generation Sequence Data.从下一代测序数据中进行变异检测
Methods Mol Biol. 2016;1418:209-24. doi: 10.1007/978-1-4939-3578-9_11.
9
Review of alignment and SNP calling algorithms for next-generation sequencing data.下一代测序数据的比对和单核苷酸多态性(SNP)检测算法综述。
J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.
10
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.基于支持向量机的方法区分长非编码 RNA 与蛋白质编码转录本。
BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.

引用本文的文献

1
Pre-vaccination transcriptomic profiles of immune responders to the MUC1 peptide vaccine for colon cancer prevention.用于预防结肠癌的MUC1肽疫苗免疫应答者的接种前转录组概况。
Front Immunol. 2024 Oct 10;15:1437391. doi: 10.3389/fimmu.2024.1437391. eCollection 2024.
2
Recessive variants in MYO1C as a potential novel cause of proteinuric kidney disease.MYO1C 中的隐性变异可能是导致蛋白尿性肾病的新原因。
Pediatr Nephrol. 2024 Oct;39(10):2939-2945. doi: 10.1007/s00467-024-06426-1. Epub 2024 Jun 21.
3
Pre-vaccination transcriptomic profiles of immune responders to the MUC1 peptide vaccine for colon cancer prevention.
用于预防结肠癌的MUC1肽疫苗免疫应答者的接种前转录组图谱。
medRxiv. 2024 May 10:2024.05.09.24305336. doi: 10.1101/2024.05.09.24305336.
4
Recessive variants in MYO1C as a potential novel cause of proteinuric kidney disease.MYO1C基因的隐性变异是蛋白尿性肾病的一个潜在新病因。
Res Sq. 2024 Apr 11:rs.3.rs-4183332. doi: 10.21203/rs.3.rs-4183332/v1.
5
Candidate probiotic Lactiplantibacillus plantarum HNU082 rapidly and convergently evolves within human, mice, and zebrafish gut but differentially influences the resident microbiome.候选益生菌植物乳杆菌 HNU082 在人类、小鼠和斑马鱼肠道内快速趋同进化,但对定植微生物组的影响不同。
Microbiome. 2021 Jun 30;9(1):151. doi: 10.1186/s40168-021-01102-0.
6
Contribution of sarcomere gene mutations to left atrial function in patients with hypertrophic cardiomyopathy.肌节基因突变对肥厚型心肌病患者左心房功能的影响。
Cardiovasc Ultrasound. 2021 Jan 6;19(1):4. doi: 10.1186/s12947-020-00233-y.
7
Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.重新校准Illumina短读长比对中的映射质量分数可改善低覆盖度测序数据中的单核苷酸多态性(SNP)检测结果。
PeerJ. 2020 Dec 7;8:e10501. doi: 10.7717/peerj.10501. eCollection 2020.
8
Genetic relevance and determinants of mitral leaflet size in hypertrophic cardiomyopathy.肥厚型心肌病中二尖瓣叶大小的遗传相关性及决定因素
Cardiovasc Ultrasound. 2019 Oct 28;17(1):21. doi: 10.1186/s12947-019-0171-1.
9
, , , , , and as candidate genes for differentiating multilocular cystic renal neoplasm of low malignant potential from clear cell renal cell carcinoma with cystic change.和 等作为低恶性潜能多房囊性肾肿瘤与囊性变透明细胞肾细胞癌鉴别诊断的候选基因。
Investig Clin Urol. 2019 May;60(3):148-155. doi: 10.4111/icu.2019.60.3.148. Epub 2019 Apr 1.
10
Establishment and characterization of new tumor xenografts and cancer cell lines from EBV-positive nasopharyngeal carcinoma.建立并鉴定源自 EBV 阳性鼻咽癌的新型肿瘤异种移植物和癌细胞系。
Nat Commun. 2018 Nov 7;9(1):4663. doi: 10.1038/s41467-018-06889-5.