HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Modeling Tabular data using Conditional GAN

Lei Xu; Maria Skoularidou; Alfredo Cuesta-Infante; Kalyan Veeramachaneni

Modeling Tabular data using Conditional GAN

Abstract

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
tabular-data-generation-on-adult-censusCopulaGAN
DT Accuracy: 76.29
LR Accuracy: 80.61
Parameters(M): 0.300
RF Accuracy: 80.46
tabular-data-generation-on-adult-censusTVAE
DT Accuracy: 82.8
LR Accuracy: 80.53
Parameters(M): 0.053
RF Accuracy: 83.48
tabular-data-generation-on-adult-censusCTGAN
DT Accuracy: 81.32
LR Accuracy: 83.2
Parameters(M): 0.302
RF Accuracy: 83.53
tabular-data-generation-on-california-housingCTGAN
DT Mean Squared Error: 0.82
LR Mean Squared Error: 0.61
Parameters(M): 0.197
RF Mean Squared Error: 0.62
tabular-data-generation-on-california-housingTVAE
DT Mean Squared Error: 0.45
LR Mean Squared Error: 0.65
Parameters(M): 0.045
RF Mean Squared Error: 0.35
tabular-data-generation-on-california-housingCopulaGAN
DT Mean Squared Error: 1.19
LR Mean Squared Error: 0.98
Parameters(M): 0.201
RF Mean Squared Error: 0.99
tabular-data-generation-on-diabetesCTGAN
DT Accuracy: 0.4973
LR Accuracy: 0.5093
Parameters(M): 9.6
RF Accuracy: 0.5223
tabular-data-generation-on-diabetesTVAE
DT Accuracy: 0.5330
LR Accuracy: 0.5634
Parameters(M): 0.359
RF Accuracy: 0.5517
tabular-data-generation-on-diabetesCopulaGAN
DT Accuracy: 0.385
LR Accuracy: 0.4027
Parameters(M): 9.4
RF Accuracy: 0.3759
tabular-data-generation-on-helocCTGAN
DT Accuracy: 61.34
LR Accuracy: 57.72
Parameters(M): 0.277
RF Accuracy: 62.35
tabular-data-generation-on-helocTVAE
DT Accuracy: 76.39
LR Accuracy: 71.04
Parameters(M): 62
RF Accuracy: 77.24
tabular-data-generation-on-helocCopulaGAN
DT Accuracy: 42.36
LR Accuracy: 42.03
Parameters(M): 0.276
RF Accuracy: 42.35
tabular-data-generation-on-sickCopulaGAN
DT Accuracy: 93.77
LR Accuracy: 94.57
Parameters(M): 0.226
RF Accuracy: 94.57
tabular-data-generation-on-sickTVAE
DT Accuracy: 95.39
LR Accuracy: 94.7
Parameters(M): 0.046
RF Accuracy: 94.91
tabular-data-generation-on-sickCTGAN
DT Accuracy: 92.05
LR Accuracy: 94.44
Parameters(M): 0.222
RF Accuracy: 94.57
tabular-data-generation-on-travelCTGAN
DT Accuracy: 73.3
LR Accuracy: 73.3
Parameters(M): 0.155
RF Accuracy: 71.41
tabular-data-generation-on-travelTVAE
DT Accuracy: 81.68
LR Accuracy: 79.58
Parameters(M): 0.036
RF Accuracy: 81.68
tabular-data-generation-on-travelCopulaGAN
DT Accuracy: 73.61
LR Accuracy: 73.3
Parameters(M): 0.157
RF Accuracy: 73.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Modeling Tabular data using Conditional GAN | Papers | HyperAI