Differentially Private Conditional Tabular GAN (DPCTGAN)#

Conditional tabular GAN with differentially private stochastic gradient descent. From “Modeling Tabular data using Conditional GAN”.

import pandas as pd
from snsynth import Synthesizer

pums = pd.read_csv("PUMS.csv")

synth = Synthesizer.create("dpctgan", epsilon=3.0, verbose=True)
synth.fit(pums, preprocessor_eps=1.0)
pums_synth = synth.sample(1000)

Parameters#

class snsynth.pytorch.nn.dpctgan.DPCTGAN(embedding_dim=128, generator_dim=(256, 256), discriminator_dim=(256, 256), generator_lr=0.0002, generator_decay=1e-06, discriminator_lr=0.0002, discriminator_decay=1e-06, batch_size=500, discriminator_steps=1, verbose=True, epochs=300, pac=1, cuda=True, disabled_dp=False, delta=None, sigma=5, max_per_sample_grad_norm=1.0, epsilon=1, loss='cross_entropy')[source]#

DPCTGAN Synthesizer.

GAN-based synthesizer that uses conditional masks to learn tabular data.

Parameters

epsilon – Privacy budget for the model.
sigma – The noise scale for the gradients. Noise scale and batch size influence how fast the privacy budget is consumed, which in turn influences the convergence rate and the quality of the synthetic data.
batch_size – The batch size for training the model.
epochs – The number of epochs to train the model.
embedding_dim – The dimensionality of the embedding layer.
generator_dim – The dimensionality of the generator layer.
discriminator_dim – The dimensionality of the discriminator layer.
generator_lr – The learning rate for the generator.
discriminator_lr – The learning rate for the discriminator.
verbose – Whether to print the training progress.
diabled_dp – Allows training without differential privacy, to diagnose whether any model issues are caused by privacy or are simply the result of GAN instability or other issues with hyperparameters.