Synthesizers API Reference

MWEM

class snsynth.MWEMSynthesizer(epsilon=3.0, *ignore, q_count=None, iterations=None, splits=[], split_factor=None, marginal_width=None, add_ranges=False, measure_only=False, max_retries_exp_mechanism=10, mult_weights_iterations=20, debug=False)[source]

Bases: snsynth.base.SDGYMBaseSynthesizer

fit(data, categorical_columns=None, ordinal_columns=None)

Fit the synthesizer model on the data.

Parameters
  • data (pd.DataFrame) – The data for fitting the synthesizer model.

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

mwem()[source]

Runner for the mwem algorithm. Initializes the synthetic histogram, and updates it for up to self.max_iterations using the exponential mechanism and multiplicative weights. Draws from the initialized query store for measurements.

Returns

synth_hist, self.histogram - synth_hist is the synthetic data histogram, self.histogram is original histo

Return type

np.ndarray, np.ndarray

sample(samples, categorical_columns=None, ordinal_columns=None)

Sample from the synthesizer model.

Parameters
  • samples (int) – The number of samples to create

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

property spent

QUAIL

class snsynth.QUAILSynthesizer(epsilon, dp_synthesizer, dp_classifier, target, test_size=0.2, seed=None, eps_split=0.9)[source]

Bases: snsynth.base.SDGYMBaseSynthesizer

fit(data, categorical_columns=None, ordinal_columns=None)

Fit the synthesizer model on the data.

Parameters
  • data (pd.DataFrame) – The data for fitting the synthesizer model.

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

sample(samples, categorical_columns=None, ordinal_columns=None)

Sample from the synthesizer model.

Parameters
  • samples (int) – The number of samples to create

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

PyTorch

class snsynth.pytorch.PytorchDPSynthesizer(epsilon, gan, preprocessor=None)[source]

Bases: snsynth.base.SDGYMBaseSynthesizer

fit(data, categorical_columns=None, ordinal_columns=None)

Fit the synthesizer model on the data.

Parameters
  • data (pd.DataFrame) – The data for fitting the synthesizer model.

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

sample(samples, categorical_columns=None, ordinal_columns=None)

Sample from the synthesizer model.

Parameters
  • samples (int) – The number of samples to create

  • categorical_columns (list[str], optional) – List of column names for categorical columns, defaults to None

  • ordinal_columns (list[str], optional) – List of column names for ordinal columns, defaults to None

Returns

Dataframe containing the generated data samples.

Return type

pd.DataFrame

class snsynth.pytorch.nn.PATEGAN(epsilon, delta=None, binary=False, latent_dim=64, batch_size=64, teacher_iters=5, student_iters=5)[source]

Bases: object

generate(n)[source]
train(data, categorical_columns=None, ordinal_columns=None, update_epsilon=None, transformer=None, continuous_columns_lower_upper=None)[source]
class snsynth.pytorch.nn.DPGAN(binary=False, latent_dim=64, batch_size=64, epochs=1000, delta=None, epsilon=1.0)[source]

Bases: object

generate(n)[source]
train(data, categorical_columns=None, ordinal_columns=None, update_epsilon=None, transformer=None, continuous_columns_lower_upper=None)[source]
class snsynth.pytorch.nn.PATECTGAN(embedding_dim=128, generator_dim=(256, 256), discriminator_dim=(256, 256), generator_lr=0.0002, generator_decay=1e-06, discriminator_lr=0.0002, discriminator_decay=1e-06, batch_size=500, discriminator_steps=1, log_frequency=False, verbose=False, epochs=300, pac=1, cuda=True, epsilon=1, binary=False, regularization=None, loss='cross_entropy', teacher_iters=5, student_iters=5, sample_per_teacher=1000, delta=None, noise_multiplier=0.001, preprocessor_eps=1, moments_order=100, category_epsilon_pct=0.1)[source]

Bases: ctgan.synthesizers.ctgan.CTGANSynthesizer

__init__(embedding_dim=128, generator_dim=(256, 256), discriminator_dim=(256, 256), generator_lr=0.0002, generator_decay=1e-06, discriminator_lr=0.0002, discriminator_decay=1e-06, batch_size=500, discriminator_steps=1, log_frequency=False, verbose=False, epochs=300, pac=1, cuda=True, epsilon=1, binary=False, regularization=None, loss='cross_entropy', teacher_iters=5, student_iters=5, sample_per_teacher=1000, delta=None, noise_multiplier=0.001, preprocessor_eps=1, moments_order=100, category_epsilon_pct=0.1)[source]
generate(n, condition_column=None, condition_value=None)[source]

TODO: Add condition_column support from CTGAN

train(data, categorical_columns=None, ordinal_columns=None, update_epsilon=None, transformer=<class 'snsynth.preprocessors.data_transformer.BaseTransformer'>, continuous_columns_lower_upper=None)[source]
w_loss(output, labels)[source]