Maximum Spanning Tree (MST)#

MST achieves state of the art results for marginals over categorical data, and does well even with small source data. From McKenna et al. “Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

Before using MST, install Private-PGM :

pip install git+https://github.com/ryan112358/private-pgm.git

And call like this:

import pandas as pd
from snsynth import Synthesizer

pums = pd.read_csv("PUMS.csv")

synth = Synthesizer.create("mst", epsilon=3.0, verbose=True)
synth.fit(pums, preprocessor_eps=1.0)
pums_synth = synth.sample(1000)

For more, see the sample notebook

Parameters#

class snsynth.mst.mst.MSTSynthesizer(epsilon=0.1, delta=1e-09, *ignore, verbose=False)[source]#

Maximum Spanning Tree synthesizer, uses Private PGM.

Parameters
  • epsilon (float) – privacy budget for the synthesizer

  • delta (float) – privacy parameter. Should be small, in the range of 1/(n * sqrt(n))

  • verbose (bool) – print diagnostic information during processing

Reuses code and modifies it lightly from https://github.com/ryan112358/private-pgm/blob/master/mechanisms/mst.py. Awesome work McKenna et. al!