Synthesizer Reference#

Table of Contents

Marginal Synthesizers
Neural Network Synthesizers
Hybrid Synthesizers

class snsynth.base.Synthesizer(*args, **kwargs)[source]#

classmethod create(synth=None, epsilon=None, *args, **kwargs)[source]#

Create a differentially private synthesizer.

Parameters

synth (str or Synthesizer class, required) – The name of the synthesizer to create. If called from an instance of a Synthesizer subclass, creates an instance of the specified synthesizer. Allowed synthesizers are available from the list_synthesizers() method.
epsilon (float, required) – The privacy budget to be allocated to the synthesizer. This budget will be used when the synthesizer is fit to the data.
args (list, optional) – Positional arguments to pass to the synthesizer constructor.
kwargs (dict, optional) – Keyword arguments to pass to the synthesizer constructor. At a minimum, the epsilon value must be provided. Any other hyperparameters can be provided here. See the documentation for each specific synthesizer for details about available hyperparameter.

fit(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False)#

Fit the synthesizer model on the data.

Parameters

data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.
transformer (snsynth.transform.TableTransformer or dictionary, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer(). The inferred preprocessor can be constrained for certain columns by providing a dictionary. Read the TableTransformer.create() method documentation for details about the constraints.
categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.
ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.
continuous_columns (list[], optional) – List of column names or indices to be treated as continuous columns, used as hints when no transformer is provided.
preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.
nullable (bool, optional) – Whether to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers. Defaults to False.

fit_sample(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False, **kwargs)#

Fit the synthesizer model and then generate a synthetic dataset of the same size of the input data.

Parameters

data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.
transformer (snsynth.transform.TableTransformer or dict, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer(). The inferred preprocessor can be constrained for certain columns by providing a dictionary. Read the TableTransformer.create() method documentation for details about the constraints.
categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.
ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.
continuous_columns (list[], optional) – List of column names or indices to be treated as continuous columns, used as hints when no transformer is provided.
preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.
nullable (bool, optional) – Whether to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers. Defaults to False.

classmethod list_synthesizers()[source]#

List the available synthesizers.

Returns: List of available synthesizer names.
Return type: list[str]

sample(n_rows)#

Sample rows from the synthesizer.

Parameters: n_rows (int) – The number of rows to create
Returns: Data set containing the generated data samples.
Return type: pd.DataFrame, np.ndarray, or list of tuples

sample_conditional(n_rows, condition, max_tries=100, column_names=None)[source]#

Generates a synthetic dataset that satisfies the given condition. Performs a rejection sampling process where up to max_tries batches are sampled. If not enough valid data samples are found the partial dataset is returned.

Parameters

n_rows (int, required) – Number of rows to sample.
condition (str, required) – Condition to evaluate. Needs to be a valid SQL WHERE clause e.g. “age < 50 AND income > 1000”.
max_tries (int, optional) – Number of times to retry sampling until enough are generated. Defaults to 100.
column_names (list[str], optional) – List of column names, required if samples is not a pd.DataFrame.

Returns

Data set containing the generated data samples.

Return type

pd.DataFrame, np.ndarray, or list of tuples

QUAIL

Synthesizer Reference#

Marginal Synthesizers #

Neural Network Synthesizers #

Hybrid Synthesizers #

Synthesizer Reference#

Marginal Synthesizers#

Neural Network Synthesizers#

Hybrid Synthesizers#

Marginal Synthesizers #

Neural Network Synthesizers #

Hybrid Synthesizers #