Synthesizer Reference#

class snsynth.base.Synthesizer(*args, **kwargs)[source]#
classmethod create(synth=None, epsilon=None, *args, **kwargs)[source]#

Create a differentially private synthesizer.

Parameters
  • synth (str or Synthesizer class, required) – The name of the synthesizer to create. If called from an instance of a Synthesizer subclass, creates an instance of the specified synthesizer. Allowed synthesizers are available from the list_synthesizers() method.

  • epsilon (float, required) – The privacy budget to be allocated to the synthesizer. This budget will be used when the synthesizer is fit to the data.

  • args (list, optional) – Positional arguments to pass to the synthesizer constructor.

  • kwargs (dict, optional) – Keyword arguments to pass to the synthesizer constructor. At a minimum, the epsilon value must be provided. Any other hyperparameters can be provided here. See the documentation for each specific synthesizer for details about available hyperparameter.

fit(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False)#

Fit the synthesizer model on the data.

Parameters
  • data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.

  • transformer (snsynth.transform.TableTransformer or dictionary, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer(). The inferred preprocessor can be constrained for certain columns by providing a dictionary. Read the TableTransformer.create() method documentation for details about the constraints.

  • categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.

  • ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.

  • continuous_columns (list[], optional) – List of column names or indices to be treated as continuous columns, used as hints when no transformer is provided.

  • preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.

  • nullable (bool, optional) – Whether to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers. Defaults to False.

fit_sample(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False, **kwargs)#

Fit the synthesizer model and then generate a synthetic dataset of the same size of the input data.

Parameters
  • data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.

  • transformer (snsynth.transform.TableTransformer or dict, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer(). The inferred preprocessor can be constrained for certain columns by providing a dictionary. Read the TableTransformer.create() method documentation for details about the constraints.

  • categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.

  • ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.

  • continuous_columns (list[], optional) – List of column names or indices to be treated as continuous columns, used as hints when no transformer is provided.

  • preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.

  • nullable (bool, optional) – Whether to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers. Defaults to False.

classmethod list_synthesizers()[source]#

List the available synthesizers.

Returns

List of available synthesizer names.

Return type

list[str]

sample(n_rows)#

Sample rows from the synthesizer.

Parameters

n_rows (int) – The number of rows to create

Returns

Data set containing the generated data samples.

Return type

pd.DataFrame, np.ndarray, or list of tuples

sample_conditional(n_rows, condition, max_tries=100, column_names=None)[source]#

Generates a synthetic dataset that satisfies the given condition. Performs a rejection sampling process where up to max_tries batches are sampled. If not enough valid data samples are found the partial dataset is returned.

Parameters
  • n_rows (int, required) – Number of rows to sample.

  • condition (str, required) – Condition to evaluate. Needs to be a valid SQL WHERE clause e.g. “age < 50 AND income > 1000”.

  • max_tries (int, optional) – Number of times to retry sampling until enough are generated. Defaults to 100.

  • column_names (list[str], optional) – List of column names, required if samples is not a pd.DataFrame.

Returns

Data set containing the generated data samples.

Return type

pd.DataFrame, np.ndarray, or list of tuples

Marginal Synthesizers#

Neural Network Synthesizers#

Hybrid Synthesizers#