Maximum Spanning Tree (MST)¶
MST achieves state of the art results for marginals over categorical data, and does well even with small source data. From McKenna et al. “Winning the NIST Contest: A scalable and general approach to differentially private synthetic data”
Before using MST, install Private-PGM :
pip install git+https://github.com/ryan112358/private-pgm.git
And call like this:
import pandas as pd from snsynth import Synthesizer pums = pd.read_csv("PUMS.csv") synth = Synthesizer.create("mst", epsilon=3.0, verbose=True) synth.fit(pums, preprocessor_eps=1.0) pums_synth = synth.sample(1000)
For more, see the sample notebook
- class snsynth.mst.mst.MSTSynthesizer(epsilon=0.1, delta=1e-09, *ignore, verbose=False)¶
Maximum Spanning Tree synthesizer, uses Private PGM.
epsilon (float) – privacy budget for the synthesizer
delta (float) – privacy parameter. Should be small, in the range of 1/(n * sqrt(n))
verbose (bool) – print diagnostic information during processing
Reuses code and modifies it lightly from https://github.com/ryan112358/private-pgm/blob/master/mechanisms/mst.py. Awesome work McKenna et. al!