Command Palette
Search for a command to run...
Vitaliy Kinakh Slava Voloshynovskiy

Abstract
Generating synthetic tabular data is critical in machine learning, especiallywhen real data is limited or sensitive. Traditional generative models oftenface challenges due to the unique characteristics of tabular data, such asmixed data types and varied distributions, and require complex preprocessing orlarge pretrained models. In this paper, we introduce a novel, lossless binarytransformation method that converts any tabular data into fixed-size binaryrepresentations, and a corresponding new generative model called BinaryDiffusion, specifically designed for binary data. Binary Diffusion leveragesthe simplicity of XOR operations for noise addition and removal and employsbinary cross-entropy loss for training. Our approach eliminates the need forextensive preprocessing, complex noise parameter tuning, and pretraining onlarge datasets. We evaluate our model on several popular tabular benchmarkdatasets, demonstrating that Binary Diffusion outperforms existingstate-of-the-art models on Travel, Adult Income, and Diabetes datasets whilebeing significantly smaller in size.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| tabular-data-generation-on-adult-census | Binary Diffusion | DT Accuracy: 85.27 LR Accuracy: 85.45 Parameters(M): 1.4 RF Accuracy: 85.74 |
| tabular-data-generation-on-california-housing | Binary Diffusion | DT Mean Squared Error: 0.45 LR Mean Squared Error: 0.55 Parameters(M): 1.5 RF Mean Squared Error: 0.39 |
| tabular-data-generation-on-diabetes | Binary Diffusion | DT Accuracy: 0.5713 LR Accuracy: 0.5775 Parameters(M): 1.8 RF Accuracy: 0.5752 |
| tabular-data-generation-on-heloc | Binary Diffusion | DT Accuracy: 70.25 LR Accuracy: 71.76 Parameters(M): 2.6 RF Accuracy: 70.47 |
| tabular-data-generation-on-sick | Binary Diffusion | DT Accuracy: 97.07 LR Accuracy: 96.14 Parameters(M): 1.4 RF Accuracy: 96.59 |
| tabular-data-generation-on-travel | Binary Diffusion | DT Accuracy: 88.9 LR Accuracy: 83.79 Parameters(M): 1.1 RF Accuracy: 89.95 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.