HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Tabular Data Generation using Binary Diffusion

Vitaliy Kinakh Slava Voloshynovskiy

Tabular Data Generation using Binary Diffusion

Abstract

Generating synthetic tabular data is critical in machine learning, especiallywhen real data is limited or sensitive. Traditional generative models oftenface challenges due to the unique characteristics of tabular data, such asmixed data types and varied distributions, and require complex preprocessing orlarge pretrained models. In this paper, we introduce a novel, lossless binarytransformation method that converts any tabular data into fixed-size binaryrepresentations, and a corresponding new generative model called BinaryDiffusion, specifically designed for binary data. Binary Diffusion leveragesthe simplicity of XOR operations for noise addition and removal and employsbinary cross-entropy loss for training. Our approach eliminates the need forextensive preprocessing, complex noise parameter tuning, and pretraining onlarge datasets. We evaluate our model on several popular tabular benchmarkdatasets, demonstrating that Binary Diffusion outperforms existingstate-of-the-art models on Travel, Adult Income, and Diabetes datasets whilebeing significantly smaller in size.

Code Repositories

vkinakh/binary-diffusion-tabular
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
tabular-data-generation-on-adult-censusBinary Diffusion
DT Accuracy: 85.27
LR Accuracy: 85.45
Parameters(M): 1.4
RF Accuracy: 85.74
tabular-data-generation-on-california-housingBinary Diffusion
DT Mean Squared Error: 0.45
LR Mean Squared Error: 0.55
Parameters(M): 1.5
RF Mean Squared Error: 0.39
tabular-data-generation-on-diabetesBinary Diffusion
DT Accuracy: 0.5713
LR Accuracy: 0.5775
Parameters(M): 1.8
RF Accuracy: 0.5752
tabular-data-generation-on-helocBinary Diffusion
DT Accuracy: 70.25
LR Accuracy: 71.76
Parameters(M): 2.6
RF Accuracy: 70.47
tabular-data-generation-on-sickBinary Diffusion
DT Accuracy: 97.07
LR Accuracy: 96.14
Parameters(M): 1.4
RF Accuracy: 96.59
tabular-data-generation-on-travelBinary Diffusion
DT Accuracy: 88.9
LR Accuracy: 83.79
Parameters(M): 1.1
RF Accuracy: 89.95

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Tabular Data Generation using Binary Diffusion | Papers | HyperAI