Maximum Spanning Tree (MST)#
MST achieves state of the art results for marginals over categorical data, and does well even with small source data. From McKenna et al. “Winning the NIST Contest: A scalable and general approach to differentially private synthetic data”
Before using MST, install Private-PGM :
pip install git+https://github.com/ryan112358/private-pgm.git
And call like this:
import pandas as pd
from snsynth import Synthesizer
pums = pd.read_csv("PUMS.csv")
synth = Synthesizer.create("mst", epsilon=3.0, verbose=True)
synth.fit(pums, preprocessor_eps=1.0)
pums_synth = synth.sample(1000)
For more, see the sample notebook
Parameters#
- class snsynth.mst.mst.MSTSynthesizer(epsilon=0.1, delta=1e-09, *ignore, verbose=False)[source]#
Maximum Spanning Tree synthesizer, uses Private PGM.
- Parameters
epsilon (float) – privacy budget for the synthesizer
delta (float) – privacy parameter. Should be small, in the range of 1/(n * sqrt(n))
verbose (bool) – print diagnostic information during processing
Reuses code and modifies it lightly from https://github.com/ryan112358/private-pgm/blob/master/mechanisms/mst.py. Awesome work McKenna et. al!