Synthesizer Reference#
Table of Contents
- class snsynth.base.Synthesizer(*args, **kwargs)[source]#
- classmethod create(synth=None, epsilon=None, *args, **kwargs)[source]#
Create a differentially private synthesizer.
- Parameters
synth (str or Synthesizer class, required) – The name of the synthesizer to create. If called from an instance of a Synthesizer subclass, creates an instance of the specified synthesizer. Allowed synthesizers are available from the list_synthesizers() method.
epsilon (float, required) – The privacy budget to be allocated to the synthesizer. This budget will be used when the synthesizer is fit to the data.
args (list, optional) – Positional arguments to pass to the synthesizer constructor.
kwargs (dict, optional) – Keyword arguments to pass to the synthesizer constructor. At a minimum, the epsilon value must be provided. Any other hyperparameters can be provided here. See the documentation for each specific synthesizer for details about available hyperparameter.
- fit(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False)#
Fit the synthesizer model on the data.
- Parameters
data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.
transformer (snsynth.transform.TableTransformer or dictionary, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer(). The inferred preprocessor can be constrained for certain columns by providing a dictionary. Read the
TableTransformer.create()
method documentation for details about the constraints.categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.
ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.
continuous_columns (list[], optional) – List of column names or indices to be treated as continuous columns, used as hints when no transformer is provided.
preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.
nullable (bool, optional) – Whether to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers. Defaults to False.
- fit_sample(data, *ignore, transformer=None, categorical_columns=[], ordinal_columns=[], continuous_columns=[], preprocessor_eps=0.0, nullable=False, **kwargs)#
Fit the synthesizer model and then generate a synthetic dataset of the same size of the input data.
- Parameters
data (pd.DataFrame, np.ndarray, or list of tuples) – The private data used to fit the synthesizer.
transformer (snsynth.transform.TableTransformer or dict, optional) – The transformer to use to preprocess the data. If no transformer is provided, the synthesizer will attempt to choose a transformer suitable for that synthesizer. To prevent the synthesizer from choosing a transformer, pass in snsynth.transform.NoTransformer(). The inferred preprocessor can be constrained for certain columns by providing a dictionary. Read the
TableTransformer.create()
method documentation for details about the constraints.categorical_columns (list[], optional) – List of column names or indixes to be treated as categorical columns, used as hints when no transformer is provided.
ordinal_columns (list[], optional) – List of column names or indices to be treated as ordinal columns, used as hints when no transformer is provided.
continuous_columns (list[], optional) – List of column names or indices to be treated as continuous columns, used as hints when no transformer is provided.
preprocessor_eps (float, optional) – The epsilon value to use when preprocessing the data. This epsilon budget is subtracted from the budget supplied when creating the synthesizer, but is only used if the preprocessing requires privacy budget, for example if bounds need to be inferred for continuous columns. This value defaults to 0.0, and the synthesizer will raise an error if the budget is not sufficient to preprocess the data.
nullable (bool, optional) – Whether to allow null values in the data. This is only used if no transformer is provided, and is used as a hint when inferring transformers. Defaults to False.
- classmethod list_synthesizers()[source]#
List the available synthesizers.
- Returns
List of available synthesizer names.
- Return type
list[str]
- sample(n_rows)#
Sample rows from the synthesizer.
- Parameters
n_rows (int) – The number of rows to create
- Returns
Data set containing the generated data samples.
- Return type
pd.DataFrame, np.ndarray, or list of tuples
- sample_conditional(n_rows, condition, max_tries=100, column_names=None)[source]#
Generates a synthetic dataset that satisfies the given condition. Performs a rejection sampling process where up to
max_tries
batches are sampled. If not enough valid data samples are found the partial dataset is returned.- Parameters
n_rows (int, required) – Number of rows to sample.
condition (str, required) – Condition to evaluate. Needs to be a valid SQL WHERE clause e.g. “age < 50 AND income > 1000”.
max_tries (int, optional) – Number of times to retry sampling until enough are generated. Defaults to 100.
column_names (list[str], optional) – List of column names, required if
samples
is not a pd.DataFrame.
- Returns
Data set containing the generated data samples.
- Return type
pd.DataFrame, np.ndarray, or list of tuples