Synthesizer Reference

class snsynth.base.Synthesizer(*args, **kwargs)[source]
classmethod create(synth=None, epsilon=None, *args, **kwargs)[source]

Create a differentially private synthesizer.

Parameters
  • synth (str or Synthesizer class, required) – The name of the synthesizer to create. If called from an instance of a Synthesizer subclass, creates an instance of the specified synthesizer. Allowed synthesizers are available from the list_synthesizers() method.

  • epsilon (float, required) – The privacy budget to be allocated to the synthesizer. This budget will be used when the synthesizer is fit to the data.

  • args (list, optional) – Positional arguments to pass to the synthesizer constructor.

  • kwargs (dict, optional) – Keyword arguments to pass to the synthesizer constructor. At a minimum, the epsilon value must be provided. Any other hyperparameters can be provided here. See the documentation for each specific synthesizer for details about available hyperparameter.

fit(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False)

Fit the synthesizer model on the data.

Parameters
  • data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.

  • transformer (snsynth.transform.TableTransformer, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer().

  • categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.

  • ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.

  • ordinal_columns – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.

  • preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.

  • nullable – Whether or not to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers.

fit_sample(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False, **kwargs)

Fit the synthesizer model and then generate a synthetic dataset of the same size of the input data.

Parameters
  • data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.

  • transformer (snsynth.transform.TableTransformer, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer().

  • categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.

  • ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.

  • ordinal_columns – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.

  • preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.

  • nullable – Whether or not to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers.

classmethod list_synthesizers()[source]

List the available synthesizers.

Returns

List of available synthesizer names.

Return type

list[str]

sample(n_rows)

Sample rows from the synthesizer.

Parameters

n_rows (int) – The number of rows to create

Returns

Data set containing the generated data samples.

Return type

pd.DataFrame, np.ndarray, or list of tuples

Marginal Synthesizers

Neural Network Synthesizers

Hybrid Synthesizers