Yang Y, Chute C G
Section of Medical Information Resources, Mayo Clinic/Foundation, Rochester, Minnesota 55905, USA.
Proc Annu Symp Comput Appl Med Care. 1995:32-6.
This paper studies the sampling strategies for the Expert Network (EexNet), a statistical learning system used for patient record classification at the Mayo Clinic. The goal is to achieve high accuracy classification at an affordable computational cost in very large applications. The learning curves of ExpNet were observed with respect to the choice of training resources, the size, vocabulary coverage and category coverage of a training set, and the category distribution over training instances. A method combining advantages of different sampling strategies is proposed and evaluated using a large training corpus. As a result, Expert Network has achieved its nearly-optimal classification accuracy (measured by average precision) using a relatively small training set, with a fast real-time response which satisfies the needs of human-machine interaction.
本文研究了专家网络(EexNet)的采样策略,EexNet是梅奥诊所用于患者记录分类的一种统计学习系统。目标是在非常大规模的应用中,以可承受的计算成本实现高精度分类。针对训练资源的选择、训练集的大小、词汇覆盖率和类别覆盖率以及训练实例的类别分布,观察了ExpNet的学习曲线。提出了一种结合不同采样策略优点的方法,并使用大型训练语料库进行了评估。结果表明,专家网络使用相对较小的训练集就实现了近乎最优的分类准确率(以平均精度衡量),具有快速的实时响应,满足了人机交互的需求。