Department of Computer Science and Engineering, University of Nebraska-Lincoln, 358 Avery Hall, Lincoln, NE 68588, USA.
Artif Intell Med. 2010 Jun;49(2):79-91. doi: 10.1016/j.artmed.2010.02.003. Epub 2010 Apr 8.
OBJECTIVE: We propose classification integration as a new method for data integration from different sources. We also propose reclassification as a new method of combining existing medical classifications for different classes. BACKGROUND: In many problems the raw data are already classified according to a set of features but need to be reclassified. Data reclassification is usually achieved using data integration methods that require the raw data, which may not be available or sharable because of privacy and legal concerns. METHODOLOGY: We introduce general classification integration and reclassification methods that create new classes by combining in a flexible way the existing classes without requiring access to the raw data. The flexibility is achieved by representing any linear classification in a constraint database. RESULTS: The experiments using support vector machines and decision trees on heart disease diagnosis and primary biliary cirrhosis data show that our classification integration method is more accurate than current data integration methods when there are many missing values in the data. The reclassification problem also can be solved using constraint databases without requiring access to the raw data. CONCLUSIONS: The classification integration and the reclassification methods are applied to two particular data sets. Beside these particular cases, our general method is also appropriate for many other application areas and may yield similar accuracy improvements. These methods may be also extended to non-linear classifiers.
目的:我们提出分类集成作为一种从不同来源整合数据的新方法。我们还提出重新分类作为一种用于不同类别的现有医学分类的组合的新方法。
背景:在许多问题中,原始数据已经根据一组特征进行了分类,但需要重新分类。数据重新分类通常使用数据集成方法来实现,这些方法需要原始数据,但由于隐私和法律问题,原始数据可能不可用或不可共享。
方法:我们介绍了通用的分类集成和重新分类方法,这些方法通过灵活地组合现有类别来创建新类别,而无需访问原始数据。这种灵活性是通过在约束数据库中表示任何线性分类来实现的。
结果:使用支持向量机和决策树对心脏病诊断和原发性胆汁性肝硬化数据进行的实验表明,当数据中有许多缺失值时,我们的分类集成方法比当前的数据集成方法更准确。重新分类问题也可以使用约束数据库来解决,而无需访问原始数据。
结论:分类集成和重新分类方法应用于两个特定的数据集。除了这些特定的情况之外,我们的通用方法还适用于许多其他应用领域,并可能产生类似的准确性提高。这些方法也可以扩展到非线性分类器。
Artif Intell Med. 2010-4-8
Artif Intell Med. 2009-12-14
Artif Intell Med. 2010-3-27
Artif Intell Med. 2011-12-15
Stud Health Technol Inform. 2008
J Imaging Inform Med. 2024-10
Front Big Data. 2023-4-17
Database (Oxford). 2010-7-6