4 个月前

使用条件GAN建模表格数据

使用条件GAN建模表格数据

摘要

建模表格数据中的行的概率分布并生成逼真的合成数据是一项非 trivial 的任务。表格数据通常包含离散列和连续列的混合。连续列可能具有多个模式,而离散列有时则存在不平衡现象,这使得建模变得困难。现有的统计模型和深度神经网络模型在处理这类数据时往往表现不佳。为此,我们设计了 TGAN(Tabular Generative Adversarial Network),该模型利用条件生成对抗网络来应对这些挑战。为了进行公平且全面的比较,我们设计了一个基准测试,其中包括 7 个模拟数据集和 8 个真实数据集,并选择了几种贝叶斯网络作为基线模型。实验结果表明,TGAN 在大多数真实数据集上优于贝叶斯方法,而其他深度学习方法则未能达到相同的效果。

基准测试

基准方法指标
tabular-data-generation-on-adult-censusCopulaGAN
DT Accuracy: 76.29
LR Accuracy: 80.61
Parameters(M): 0.300
RF Accuracy: 80.46
tabular-data-generation-on-adult-censusTVAE
DT Accuracy: 82.8
LR Accuracy: 80.53
Parameters(M): 0.053
RF Accuracy: 83.48
tabular-data-generation-on-adult-censusCTGAN
DT Accuracy: 81.32
LR Accuracy: 83.2
Parameters(M): 0.302
RF Accuracy: 83.53
tabular-data-generation-on-california-housingCTGAN
DT Mean Squared Error: 0.82
LR Mean Squared Error: 0.61
Parameters(M): 0.197
RF Mean Squared Error: 0.62
tabular-data-generation-on-california-housingTVAE
DT Mean Squared Error: 0.45
LR Mean Squared Error: 0.65
Parameters(M): 0.045
RF Mean Squared Error: 0.35
tabular-data-generation-on-california-housingCopulaGAN
DT Mean Squared Error: 1.19
LR Mean Squared Error: 0.98
Parameters(M): 0.201
RF Mean Squared Error: 0.99
tabular-data-generation-on-diabetesCTGAN
DT Accuracy: 0.4973
LR Accuracy: 0.5093
Parameters(M): 9.6
RF Accuracy: 0.5223
tabular-data-generation-on-diabetesTVAE
DT Accuracy: 0.5330
LR Accuracy: 0.5634
Parameters(M): 0.359
RF Accuracy: 0.5517
tabular-data-generation-on-diabetesCopulaGAN
DT Accuracy: 0.385
LR Accuracy: 0.4027
Parameters(M): 9.4
RF Accuracy: 0.3759
tabular-data-generation-on-helocCTGAN
DT Accuracy: 61.34
LR Accuracy: 57.72
Parameters(M): 0.277
RF Accuracy: 62.35
tabular-data-generation-on-helocTVAE
DT Accuracy: 76.39
LR Accuracy: 71.04
Parameters(M): 62
RF Accuracy: 77.24
tabular-data-generation-on-helocCopulaGAN
DT Accuracy: 42.36
LR Accuracy: 42.03
Parameters(M): 0.276
RF Accuracy: 42.35
tabular-data-generation-on-sickCopulaGAN
DT Accuracy: 93.77
LR Accuracy: 94.57
Parameters(M): 0.226
RF Accuracy: 94.57
tabular-data-generation-on-sickTVAE
DT Accuracy: 95.39
LR Accuracy: 94.7
Parameters(M): 0.046
RF Accuracy: 94.91
tabular-data-generation-on-sickCTGAN
DT Accuracy: 92.05
LR Accuracy: 94.44
Parameters(M): 0.222
RF Accuracy: 94.57
tabular-data-generation-on-travelCTGAN
DT Accuracy: 73.3
LR Accuracy: 73.3
Parameters(M): 0.155
RF Accuracy: 71.41
tabular-data-generation-on-travelTVAE
DT Accuracy: 81.68
LR Accuracy: 79.58
Parameters(M): 0.036
RF Accuracy: 81.68
tabular-data-generation-on-travelCopulaGAN
DT Accuracy: 73.61
LR Accuracy: 73.3
Parameters(M): 0.157
RF Accuracy: 73.3

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
使用条件GAN建模表格数据 | 论文 | HyperAI超神经