• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多类别风格特征下的作者权重证据:基于多项的离散模型。

Weight of authorship evidence with multiple categories of stylometric features: A multinomial-based discrete model.

机构信息

Speech and Language Laboratory, The Australian National University, Building #9, Canberra, ACT 2600, Australia; Linguistics Program, School of Culture, History and Language, College of Asia and the Pacific, The Australian National University, Building #9, Canberra, ACT 2600, Australia.

出版信息

Sci Justice. 2023 Mar;63(2):181-199. doi: 10.1016/j.scijus.2022.12.007. Epub 2023 Jan 3.

DOI:10.1016/j.scijus.2022.12.007
PMID:36870699
Abstract

This study empirically demonstrates the efficacy of a two-level Dirichlet-multinomial statistical model (the Multinomial system) for computing likelihood ratios (LR) for linguistic, textual evidence with multiple stylometric feature types with discrete values. The LRs are calculated separately for each feature type, namely, word, character and part of speech N-grams (N = 1,2,3), which are combined as overall LRs through logistic regression fusion. The Multinomial system's performance is compared with that of a previously proposed system with the cosine distance (the Cosine system) using the same data (i.e., documents collated from 2160 authors). The experimental results show that: (1) the Multinomial system outperforms the Cosine system with the fused feature types by a log-LR cost of ca. 0.01 ∼ 0.05 bits; and (2) the Multinomial system is more advantageous in performance with longer documents than the Cosine system. Although the Cosine system is more robust overall against the sampling variability arising from the number of authors included in the reference and calibration databases, the Multinomial system can achieve reasonable stability in performance; for example, the standard deviation value of the log-LR cost becomes lower than 0.01 (10 random samplings of authors for the reference and calibration databases) with 60 or more authors in each database.

摘要

本研究从实证角度证明了二层次狄利克雷多项式统计模型(多项式系统)在计算具有离散值的多种文体特征类型的语言、文本证据似然比(LR)方面的有效性。LR 分别针对每个特征类型进行计算,即单词、字符和词性 N 元组(N=1、2、3),通过逻辑回归融合将这些特征类型的 LR 组合为总体 LR。将多项式系统与之前使用相同数据(即从 2160 位作者整理的文档)提出的基于余弦距离的系统(余弦系统)进行比较。实验结果表明:(1)融合特征类型后,多项式系统的对数 LR 成本比余弦系统高出约 0.01~0.05 位;(2)与余弦系统相比,多项式系统在处理较长文档时具有更高的性能优势。尽管余弦系统在整体上对参考和校准数据库中包含的作者数量引起的抽样可变性更稳健,但多项式系统可以实现合理的性能稳定性;例如,在每个数据库中包含 60 个或更多作者时,对数 LR 成本的标准偏差值会降低到 0.01 以下(参考和校准数据库的 10 次随机作者抽样)。

相似文献

1
Weight of authorship evidence with multiple categories of stylometric features: A multinomial-based discrete model.多类别风格特征下的作者权重证据:基于多项的离散模型。
Sci Justice. 2023 Mar;63(2):181-199. doi: 10.1016/j.scijus.2022.12.007. Epub 2023 Jan 3.
2
Strength of linguistic text evidence: A fused forensic text comparison system.语言文本证据的强度:一种融合的法医文本比较系统。
Forensic Sci Int. 2017 Sep;278:184-197. doi: 10.1016/j.forsciint.2017.06.040. Epub 2017 Jul 8.
3
Score-based likelihood ratios for linguistic text evidence with a bag-of-words model.基于词袋模型的语言文本证据的评分似然比。
Forensic Sci Int. 2021 Oct;327:110980. doi: 10.1016/j.forsciint.2021.110980. Epub 2021 Aug 25.
4
Likelihood ratio estimation for authorship text evidence: An empirical comparison of score- and feature-based methods.基于评分和特征的作者文本证据似然比估计:方法的实证比较。
Forensic Sci Int. 2022 May;334:111268. doi: 10.1016/j.forsciint.2022.111268. Epub 2022 Mar 10.
5
Learning Stylometric Representations for Authorship Analysis.学习文体风格表示法进行作者分析。
IEEE Trans Cybern. 2019 Jan;49(1):107-121. doi: 10.1109/TCYB.2017.2766189. Epub 2017 Nov 21.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs.使用双元音共振峰轨迹的参数表示进行似然比法医语音比较。
J Acoust Soc Am. 2009 Apr;125(4):2387-97. doi: 10.1121/1.3081384.
8
Multi-element comparisons of tapes evidence using dimensionality reduction for calculating likelihood ratios.使用降维法计算似然比的胶带证据多元素比较。
Forensic Sci Int. 2019 Aug;301:426-434. doi: 10.1016/j.forsciint.2019.06.002. Epub 2019 Jun 12.
9
The log multinomial regression model for nominal outcomes with more than two attributes.用于具有两个以上属性的名义结果的对数多项回归模型。
Biom J. 2007 Dec;49(6):889-902. doi: 10.1002/bimj.200610377.
10
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.一种基于微生物组成的用于疾病诊断的狄利克雷多项贝叶斯分类器。
mSphere. 2017 Dec 13;2(6). doi: 10.1128/mSphereDirect.00536-17. eCollection 2017 Nov-Dec.