Straub Jeremy
Department of Computer Science, North Dakota State University.
MethodsX. 2021 Aug 2;8:101477. doi: 10.1016/j.mex.2021.101477. eCollection 2021.
A method is proposed for generating application domain agnostic data for training and evaluating machine learning systems. The proposed method randomly generates an expert system network based upon user specified parameters. This expert system serves as a generic model of an unspecified phenomena. The expert system is run to determine the ideal output from a set of random inputs. These inputs and ideal output are used for training and testing a machine learning system. This allows a machine learning technology to be developed and tested without requiring compatible test data to be collected or before data collection as a proof-of-concept validation of system operations. It also allows systems to be tested without data error noise or with known levels of noise and with other perturbations, to facilitate analysis. It may also facilitate testing system security, adversarial attacks and conducting other types of research into machine learning systems. •Provides an application domain agnostic way to test machine learning technologies and facilitates the generalization of results.•Allows technologies to be tested with data with different characteristics without having to locate datasets that have these characteristics.•Utilizes randomly generated network to represent non-specific phenomena which can be used for training and testing machine learning techniques.
提出了一种用于生成与应用领域无关的数据以训练和评估机器学习系统的方法。所提出的方法基于用户指定的参数随机生成一个专家系统网络。该专家系统作为未指定现象的通用模型。运行该专家系统以确定一组随机输入的理想输出。这些输入和理想输出用于训练和测试机器学习系统。这使得机器学习技术能够在无需收集兼容测试数据的情况下进行开发和测试,或者在数据收集之前作为系统操作的概念验证验证进行测试。它还允许在没有数据错误噪声或具有已知噪声水平以及其他干扰的情况下对系统进行测试,以方便分析。它还可能有助于测试系统安全性、对抗性攻击以及对机器学习系统进行其他类型的研究。•提供一种与应用领域无关的方式来测试机器学习技术,并促进结果的泛化。•允许使用具有不同特征的数据对技术进行测试,而无需查找具有这些特征的数据集。•利用随机生成的网络来表示可用于训练和测试机器学习技术的非特定现象。